Account

Advanced25 Chapters

GenAI Evaluation, Safety & Governance

Master LLM evaluation, safety engineering, and AI governance. Covers evaluation dataset curation with NeMo Safe Synthesizer, LLM-as-judge with Vertex AI Eval Service and MLflow MemAlign, RAG evaluation with RAGAS/DeepEval 3.0, observability with Langfuse v3 (OpenTelemetry-native)/Arize Phoenix/Braintrust, agent trajectory scoring with DeepEval @observe and Vertex AI Agent Evaluation and Patronus Percival, human-in-the-loop with Argilla, A/B testing, eval-driven CI/CD with Promptfoo and DeepEval, continuous production monitoring, cross-model comparison with NeMo Evaluator, cost governance with LiteLLM gateway and RouteLLM semantic routing, prompt injection defense with Meta PromptGuard 2 (via Groq API) and LlamaFirewall and Google Model Armor, content safety with Guardrails AI/NeMo Guardrails 0.20/NemoGuard NIMs/LlamaGuard 4, PII redaction with Presidio and Google Sensitive Data Protection, hallucination detection with Patronus AI Lynx 2.0, adversarial testing with PyRIT/Garak 0.14/Promptfoo Hydra/NeMo Auditor, agent safety with CodeShield and MCP security (CoSAI taxonomy), multi-modal safety, vector security, OWASP LLM Top 10 2025 and MITRE ATLAS, EU AI Act (enforcement active), NIST AI RMF/ISO 42001 compliance, red teaming with Meta GOAT and Inspect AI, and bias/fairness monitoring with safety scorecards. All labs run in GKE (Google Kubernetes Engine) pods using hosted LLM APIs (OpenAI, Gemini, Anthropic, LlamaGuard 4 via Together/Groq, PromptGuard 2 via Groq).

EvaluationSafetyRed TeamingGovernanceCompliance

Learning Path

8 phases • 25 chapters

Phase 10/10 chapters

Foundations

Python essentials and development environment for agent development

0/453 quiz questions

0/180 labs

Tools & Topics

Virtual environments, async programming, type hints, Pydantic, error handling, testing, debugging, logging, project structure

Goals

•Set up professional development environments
•Write async Python code fluently
•Use type hints and Pydantic for robust data handling
•Implement error handling, testing, logging, and debugging

Chapters

1. Evaluation Dataset Curation

2. LLM-as-Judge Evaluation

3. RAG Evaluation with RAGAS & DeepEval

4. Evaluation Observability with Langfuse v3 & OpenTelemetry

5. Agent Trajectory Evaluation

6. Human-in-the-Loop Evaluation

7. A/B Testing for LLM Systems

8. Evaluation-Driven CI/CD & Continuous Production Monitoring

9. Cross-Model Evaluation

10. Cost Governance & Token Budgets

Phase 20/7 chapters

LLM Fundamentals

Core LLM concepts: API clients, token economics, caching, and function calling basics

0/349 quiz questions

0/126 labs

Tools & Topics

LLM APIs, OpenAI/Anthropic/Gemini clients, prompt caching, token economics, function calling basics

Goals

•Call multiple LLM providers (OpenAI, Anthropic, Gemini)
•Implement prompt caching and token cost management
•Build function calling and tool definitions
•Understand token economics and cost optimization

Chapters

11. Prompt Injection Defense

12. Content Safety Filters

13. PII Detection & Redaction

14. Hallucination Detection

15. Adversarial Robustness Testing

16. Agent Safety, MCP Security & Sandboxing

17. Multi-Modal Safety

Phase 30/8 chapters

Agent Fundamentals

Agent patterns: ReAct, planning, tool execution, sandboxing, web navigation, and MCP protocol

0/399 quiz questions

0/144 labs

Tools & Topics

ReAct loop, planning patterns, tool execution, sandboxing, web navigation, MCP servers, MCP clients, tool routing

Goals

•Create agent loops with ReAct and planning patterns
•Build and consume MCP servers for tool integration
•Implement sandboxing and web navigation
•Design structured outputs and prompts

Chapters

18. Vector & Embedding Security

19. OWASP LLM Top 10 2025 & MITRE ATLAS

20. EU AI Act Compliance

21. Compliance Frameworks

22. Red Teaming Methodology

23. Bias, Fairness & Continuous Monitoring

24. End-to-End Eval, Safety & Governance Pipeline

25. Enterprise Safety Operations Capstone

Phase 40/0 chapters

Agent State & Memory

Memory systems, RAG patterns, context optimization, and LangGraph state machines

0/0 quiz questions

0/0 labs

Tools & Topics

Short-term memory, long-term memory (RAG), agentic RAG patterns, semantic memory, context optimization, state graphs, conditional edges, checkpointing, human-in-the-loop, streaming, subgraphs

Goals

•Implement short-term and long-term memory
•Build RAG and agentic RAG systems
•Create state machines with LangGraph
•Implement checkpointing, streaming, and human-in-the-loop

Chapters

Phase 50/0 chapters

Multi-Agent Systems

Multi-agent patterns, guardrails, evaluations, and observability

0/0 quiz questions

0/0 labs

Tools & Topics

Supervisor pattern, hierarchical pattern, reflector pattern, input guardrails, output guardrails, prompt injection defense, evaluations, benchmarking, tracing, observability

Goals

•Implement supervisor, hierarchical, and reflector patterns
•Build input and output guardrails
•Defend against prompt injection attacks
•Evaluate agents with benchmarks

Chapters

Phase 60/0 chapters

Production & Operations

Production deployment: APIs, containers, databases, scaling, CI/CD, and monitoring

0/0 quiz questions

0/0 labs

Tools & Topics

FastAPI, Docker, production databases, scaling, CI/CD, monitoring, alerting, model routing, fallbacks, system design

Goals

•Serve agents via FastAPI with Docker
•Deploy to Kubernetes with CI/CD
•Monitor with Prometheus/Grafana
•Build multi-tenant agent platforms

Chapters

Phase 70/0 chapters

Advanced Topics

Alternative frameworks, protocols, specialized agents, autonomous workflows, and cutting-edge capabilities

0/0 quiz questions

0/0 labs

Tools & Topics

CrewAI/AutoGen, A2A protocols, GraphRAG, local models, vision agents, voice agents, code agents, autonomous workflows, streaming data, agent swarms

Goals

•Use alternative frameworks (CrewAI, AutoGen)
•Implement A2A protocol for agent communication
•Build GraphRAG for complex knowledge
•Build vision, computer use, and voice agents

Chapters

Phase 80/0 chapters

Agent Production Excellence

Production excellence: trajectory evaluation, safety, cost control, enterprise patterns, and governance

0/0 quiz questions

0/0 labs

Tools & Topics

Agent trajectory evaluation, safety boundaries, cost control, enterprise agent patterns, load testing, versioning, fleet dashboards, autonomous agent governance

Goals

•Score multi-step agent reasoning with LLM-as-judge pipelines
•Build safety boundaries with permissions and kill switches
•Implement per-agent cost budgets and cost-aware routing
•Deploy enterprise agent patterns for document processing and code review