GenAI Agent Engineering

Build autonomous multi-agent systems with planning, reasoning, tool use, memory, MCP/A2A protocols, safety boundaries, and production evaluation.

13 skill groups5 courses790 goals~408 hrs

Verifiable skill graph

13 skill groups · each becomes a signed node on your graph.

Every lab you pass signs a W3C Verifiable Credential on your public skill graph. Completing the labs in each group below mints one node on that graph — the badge you walk away with is a cryptographic record of what you can ship, not a completion certificate.

Share the URL on your résumé or with a hiring manager. They click; they see the discipline, the labs you passed, and the verification signature. No honor system, no broker.

01
Tool Use & MCP

The irreducible core of an agent: function calling, Pydantic-typed tool definitions, multi-tool and parallel calls, sandboxed/safe tool execution, and MCP servers + clients.

02
Reasoning & Planning Patterns

Agent reasoning and planning patterns: ReAct, planner-executor, reflection/self-critique, chain-of-thought, task decomposition, and dynamic re-planning.

03
Agent Evaluation & Testing

The scientific method for non-deterministic agents: trajectory and final-outcome scoring, offline eval sets, LLM-as-judge, regression suites, benchmarks, and test harnesses.

04
Reliability & Failure Recovery

What separates a demo from production: step-level retries, circuit breakers, fallback chains, graceful degradation, structured-output validation + auto-repair, idempotency, and replay/recovery.

05
Durable Execution & Control Flow

Developer-authored agent control flow: state graphs, conditional edges, checkpointers and time-travel, human-in-the-loop gates, streaming, and subgraph composition.

06
Context Engineering & Memory

The core craft of long-running agents: context-window engineering (compaction, pruning, managing context rot), long-term/semantic/episodic memory, summarization, and retrieval/RAG as a tool.

07
Multi-Agent Coordination & Handoffs

Coordinating multiple agents when one isn't enough: supervisor/router patterns, hierarchical organizations, agent-to-agent handoffs and delegation, and result aggregation.

08
Agent Safety & Guardrails

Keeping autonomous agents within bounds: input/output guardrails, prompt-injection defense, PII scrubbing, content moderation, action allow/deny boundaries, and policy enforcement.

09
Operations, Observability & Cost Control

Running agents in production: tracing (OpenTelemetry/Langfuse/Logfire), step-level logging, fleet dashboards and alerting, per-agent token budgets, loop-termination on budget, and versioning/rollback.

10
Specialized Agent Environments

Agents that act in an external environment — computer-use, web-browsing, and code agents — with screenshotting, action verification, and environment recovery.

11
Agent Deployment & Serving

Shipping agents as services: HTTP/WebSocket endpoints (FastAPI), containerization, deploy + autoscale, and CI/CD for agent code.

12
LLM API Foundations

Baseline LLM access from agent code: OpenAI/Anthropic/Gemini SDK calls, auth, structured outputs, sampling control, and unified multi-provider interfaces.

13
Python for Agent Engineering

Production-grade Python applied to agents: async/asyncio, type hints, Pydantic, dataclasses, decorators, context managers — the language fluency to ship agent code.

What you'll ship in production

Core responsibilities this discipline prepares you for.

  1. 1

    Design autonomous GenAI agents

    using state machines with tool calling, memory, and planning

    • Build LangGraph agents from scratch: define graph nodes, conditional edges, state schemas, and checkpointing
    • Progress from simple ReAct agents → planning agents → multi-step agents with persistent memory
    • Apply state machine theory to design agent graphs for complex, real-world task scenarios
  2. 2

    Build multi-agent systems

    with supervisor/worker hierarchies, delegation, and parallel execution

    • Implement supervisor agent patterns that route tasks to specialist worker agents
    • Construct hierarchical team structures with dynamic agent spawning and swarm coordination
    • Monitor cross-agent execution with delegation rules and parallel task orchestration
  3. 3

    Implement MCP servers and clients

    for standardized tool integration

    • Build Model Context Protocol servers that expose REST APIs as discoverable agent tools
    • Implement MCP clients in LangGraph agents with dynamic tool registration and schema negotiation
    • Validate tool selection accuracy across diverse query types and measure invocation reliability
  4. 4

    Enable agent-to-agent communication

    using A2A protocol for cross-framework interoperability

    • Implement A2A v0.3 protocol mechanics: Agent Cards, task lifecycle management, and gRPC transport
    • Build A2A-compatible agents using Google ADK with capability advertising
    • Verify cross-framework interoperability between independently built agent systems
  5. 5

    Build production RAG agents

    with iterative retrieval, self-verification, and query decomposition

    • Add vector search nodes to LangGraph agent graphs with quality-checked retrieval loops
    • Implement query decomposition for complex multi-part questions with iterative refinement
    • Benchmark agentic RAG against static RAG pipelines using faithfulness and relevance metrics
  6. 6

    Implement guardrails and safety controls

    within agent workflows

    • Integrate NeMo Guardrails for content filtering within running agent execution loops
    • Add LlamaFirewall middleware with policy-based tool access control and output filtering
    • Quantify safety-vs-helpfulness tradeoffs using adversarial test suites and scoring rubrics
  7. 7

    Evaluate agent performance

    with trajectory analysis and cost tracking

    • Build evaluation harnesses measuring trajectory quality, tool selection accuracy, and task completion
    • Run agents against standardized test suites and analyze per-task token cost attribution
    • Track agent quality regressions over time with Langfuse observability dashboards
  8. 8

    Design context engineering

    — systematic composition of prompts, memory, tools, and history

    • Structure system prompts, conversation memory windows, and tool result formatting strategies
    • Optimize context window utilization across multi-turn conversations with token budgeting
    • Measure agent behavior differences across context designs using controlled A/B evaluations

Curriculum

5 courses · each builds on previous goals

10 goals unlocked for preview — click to read. Locked goals need a subscription.