All Courses
Advanced25 Chapters

GenAI Evaluation, Safety & Governance

Master LLM evaluation, safety engineering, and AI governance. Covers evaluation dataset curation with NeMo Safe Synthesizer, LLM-as-judge with Vertex AI Eval Service and MLflow MemAlign, RAG evaluation with RAGAS/DeepEval 3.0, observability with Langfuse v3 (OpenTelemetry-native)/Arize Phoenix/Braintrust, agent trajectory scoring with DeepEval @observe and Vertex AI Agent Evaluation and Patronus Percival, human-in-the-loop with Argilla, A/B testing, eval-driven CI/CD with Promptfoo and DeepEval, continuous production monitoring, cross-model comparison with NeMo Evaluator, cost governance with LiteLLM gateway and RouteLLM semantic routing, prompt injection defense with Meta PromptGuard 2 (via Groq API) and LlamaFirewall and Google Model Armor, content safety with Guardrails AI/NeMo Guardrails 0.20/NemoGuard NIMs/LlamaGuard 4, PII redaction with Presidio and Google Sensitive Data Protection, hallucination detection with Patronus AI Lynx 2.0, adversarial testing with PyRIT/Garak 0.14/Promptfoo Hydra/NeMo Auditor, agent safety with CodeShield and MCP security (CoSAI taxonomy), multi-modal safety, vector security, OWASP LLM Top 10 2025 and MITRE ATLAS, EU AI Act (enforcement active), NIST AI RMF/ISO 42001 compliance, red teaming with Meta GOAT and Inspect AI, and bias/fairness monitoring with safety scorecards. All labs run in GKE (Google Kubernetes Engine) pods using hosted LLM APIs (OpenAI, Gemini, Anthropic, LlamaGuard 4 via Together/Groq, PromptGuard 2 via Groq).

EvaluationSafetyRed TeamingGovernanceCompliance

Learning Path

8 phases • 25 chapters
Phase 10/10 chapters

Foundations

Python essentials and development environment for agent development

0/453 quiz questions
0/180 labs

Tools & Topics

Virtual environments, async programming, type hints, Pydantic, error handling, testing, debugging, logging, project structure

Goals

  • Set up professional development environments
  • Write async Python code fluently
  • Use type hints and Pydantic for robust data handling
  • Implement error handling, testing, logging, and debugging

Chapters

1. Evaluation Dataset Curation
2. LLM-as-Judge Evaluation
3. RAG Evaluation with RAGAS & DeepEval
4. Evaluation Observability with Langfuse v3 & OpenTelemetry
5. Agent Trajectory Evaluation
6. Human-in-the-Loop Evaluation
7. A/B Testing for LLM Systems
8. Evaluation-Driven CI/CD & Continuous Production Monitoring
9. Cross-Model Evaluation
10. Cost Governance & Token Budgets
Phase 20/7 chapters

LLM Fundamentals

Core LLM concepts: API clients, token economics, caching, and function calling basics

0/349 quiz questions
0/126 labs

Tools & Topics

LLM APIs, OpenAI/Anthropic/Gemini clients, prompt caching, token economics, function calling basics

Goals

  • Call multiple LLM providers (OpenAI, Anthropic, Gemini)
  • Implement prompt caching and token cost management
  • Build function calling and tool definitions
  • Understand token economics and cost optimization

Chapters

11. Prompt Injection Defense
12. Content Safety Filters
13. PII Detection & Redaction
14. Hallucination Detection
15. Adversarial Robustness Testing
16. Agent Safety, MCP Security & Sandboxing
17. Multi-Modal Safety
Phase 30/8 chapters

Agent Fundamentals

Agent patterns: ReAct, planning, tool execution, sandboxing, web navigation, and MCP protocol

0/399 quiz questions
0/144 labs

Tools & Topics

ReAct loop, planning patterns, tool execution, sandboxing, web navigation, MCP servers, MCP clients, tool routing

Goals

  • Create agent loops with ReAct and planning patterns
  • Build and consume MCP servers for tool integration
  • Implement sandboxing and web navigation
  • Design structured outputs and prompts

Chapters

18. Vector & Embedding Security
19. OWASP LLM Top 10 2025 & MITRE ATLAS
20. EU AI Act Compliance
21. Compliance Frameworks
22. Red Teaming Methodology
23. Bias, Fairness & Continuous Monitoring
24. End-to-End Eval, Safety & Governance Pipeline
25. Enterprise Safety Operations Capstone
Phase 40/0 chapters

Agent State & Memory

Memory systems, RAG patterns, context optimization, and LangGraph state machines

0/0 quiz questions
0/0 labs

Tools & Topics

Short-term memory, long-term memory (RAG), agentic RAG patterns, semantic memory, context optimization, state graphs, conditional edges, checkpointing, human-in-the-loop, streaming, subgraphs

Goals

  • Implement short-term and long-term memory
  • Build RAG and agentic RAG systems
  • Create state machines with LangGraph
  • Implement checkpointing, streaming, and human-in-the-loop

Chapters

Phase 50/0 chapters

Multi-Agent Systems

Multi-agent patterns, guardrails, evaluations, and observability

0/0 quiz questions
0/0 labs

Tools & Topics

Supervisor pattern, hierarchical pattern, reflector pattern, input guardrails, output guardrails, prompt injection defense, evaluations, benchmarking, tracing, observability

Goals

  • Implement supervisor, hierarchical, and reflector patterns
  • Build input and output guardrails
  • Defend against prompt injection attacks
  • Evaluate agents with benchmarks

Chapters

Phase 60/0 chapters

Production & Operations

Production deployment: APIs, containers, databases, scaling, CI/CD, and monitoring

0/0 quiz questions
0/0 labs

Tools & Topics

FastAPI, Docker, production databases, scaling, CI/CD, monitoring, alerting, model routing, fallbacks, system design

Goals

  • Serve agents via FastAPI with Docker
  • Deploy to Kubernetes with CI/CD
  • Monitor with Prometheus/Grafana
  • Build multi-tenant agent platforms

Chapters

Phase 70/0 chapters

Advanced Topics

Alternative frameworks, protocols, specialized agents, autonomous workflows, and cutting-edge capabilities

0/0 quiz questions
0/0 labs

Tools & Topics

CrewAI/AutoGen, A2A protocols, GraphRAG, local models, vision agents, voice agents, code agents, autonomous workflows, streaming data, agent swarms

Goals

  • Use alternative frameworks (CrewAI, AutoGen)
  • Implement A2A protocol for agent communication
  • Build GraphRAG for complex knowledge
  • Build vision, computer use, and voice agents

Chapters

Phase 80/0 chapters

Agent Production Excellence

Production excellence: trajectory evaluation, safety, cost control, enterprise patterns, and governance

0/0 quiz questions
0/0 labs

Tools & Topics

Agent trajectory evaluation, safety boundaries, cost control, enterprise agent patterns, load testing, versioning, fleet dashboards, autonomous agent governance

Goals

  • Score multi-step agent reasoning with LLM-as-judge pipelines
  • Build safety boundaries with permissions and kill switches
  • Implement per-agent cost budgets and cost-aware routing
  • Deploy enterprise agent patterns for document processing and code review

Chapters

© 2026 GenBodha. All rights reserved.