GenAI Agent Engineering
Build autonomous multi-agent systems with planning, reasoning, tool use, memory, MCP/A2A protocols, safety boundaries, and production evaluation.
Verifiable skill graph
13 skill groups · each becomes a signed node on your graph.
Verifiable skill graph
13 skill groups · each becomes a signed node on your graph.
Every lab you pass signs a W3C Verifiable Credential on your public skill graph. Completing the labs in each group below mints one node on that graph — the badge you walk away with is a cryptographic record of what you can ship, not a completion certificate.
Share the URL on your résumé or with a hiring manager. They click; they see the discipline, the labs you passed, and the verification signature. No honor system, no broker.
The irreducible core of an agent: function calling, Pydantic-typed tool definitions, multi-tool and parallel calls, sandboxed/safe tool execution, and MCP servers + clients.
Agent reasoning and planning patterns: ReAct, planner-executor, reflection/self-critique, chain-of-thought, task decomposition, and dynamic re-planning.
The scientific method for non-deterministic agents: trajectory and final-outcome scoring, offline eval sets, LLM-as-judge, regression suites, benchmarks, and test harnesses.
What separates a demo from production: step-level retries, circuit breakers, fallback chains, graceful degradation, structured-output validation + auto-repair, idempotency, and replay/recovery.
Developer-authored agent control flow: state graphs, conditional edges, checkpointers and time-travel, human-in-the-loop gates, streaming, and subgraph composition.
The core craft of long-running agents: context-window engineering (compaction, pruning, managing context rot), long-term/semantic/episodic memory, summarization, and retrieval/RAG as a tool.
Coordinating multiple agents when one isn't enough: supervisor/router patterns, hierarchical organizations, agent-to-agent handoffs and delegation, and result aggregation.
Keeping autonomous agents within bounds: input/output guardrails, prompt-injection defense, PII scrubbing, content moderation, action allow/deny boundaries, and policy enforcement.
Running agents in production: tracing (OpenTelemetry/Langfuse/Logfire), step-level logging, fleet dashboards and alerting, per-agent token budgets, loop-termination on budget, and versioning/rollback.
Agents that act in an external environment — computer-use, web-browsing, and code agents — with screenshotting, action verification, and environment recovery.
Shipping agents as services: HTTP/WebSocket endpoints (FastAPI), containerization, deploy + autoscale, and CI/CD for agent code.
Baseline LLM access from agent code: OpenAI/Anthropic/Gemini SDK calls, auth, structured outputs, sampling control, and unified multi-provider interfaces.
Production-grade Python applied to agents: async/asyncio, type hints, Pydantic, dataclasses, decorators, context managers — the language fluency to ship agent code.
What you'll ship in production
Core responsibilities this discipline prepares you for.
What you'll ship in production
Core responsibilities this discipline prepares you for.
- 1
Design autonomous GenAI agents
using state machines with tool calling, memory, and planning
- Build LangGraph agents from scratch: define graph nodes, conditional edges, state schemas, and checkpointing
- Progress from simple ReAct agents → planning agents → multi-step agents with persistent memory
- Apply state machine theory to design agent graphs for complex, real-world task scenarios
- 2
Build multi-agent systems
with supervisor/worker hierarchies, delegation, and parallel execution
- Implement supervisor agent patterns that route tasks to specialist worker agents
- Construct hierarchical team structures with dynamic agent spawning and swarm coordination
- Monitor cross-agent execution with delegation rules and parallel task orchestration
- 3
Implement MCP servers and clients
for standardized tool integration
- Build Model Context Protocol servers that expose REST APIs as discoverable agent tools
- Implement MCP clients in LangGraph agents with dynamic tool registration and schema negotiation
- Validate tool selection accuracy across diverse query types and measure invocation reliability
- 4
Enable agent-to-agent communication
using A2A protocol for cross-framework interoperability
- Implement A2A v0.3 protocol mechanics: Agent Cards, task lifecycle management, and gRPC transport
- Build A2A-compatible agents using Google ADK with capability advertising
- Verify cross-framework interoperability between independently built agent systems
- 5
Build production RAG agents
with iterative retrieval, self-verification, and query decomposition
- Add vector search nodes to LangGraph agent graphs with quality-checked retrieval loops
- Implement query decomposition for complex multi-part questions with iterative refinement
- Benchmark agentic RAG against static RAG pipelines using faithfulness and relevance metrics
- 6
Implement guardrails and safety controls
within agent workflows
- Integrate NeMo Guardrails for content filtering within running agent execution loops
- Add LlamaFirewall middleware with policy-based tool access control and output filtering
- Quantify safety-vs-helpfulness tradeoffs using adversarial test suites and scoring rubrics
- 7
Evaluate agent performance
with trajectory analysis and cost tracking
- Build evaluation harnesses measuring trajectory quality, tool selection accuracy, and task completion
- Run agents against standardized test suites and analyze per-task token cost attribution
- Track agent quality regressions over time with Langfuse observability dashboards
- 8
Design context engineering
— systematic composition of prompts, memory, tools, and history
- Structure system prompts, conversation memory windows, and tool result formatting strategies
- Optimize context window utilization across multi-turn conversations with token budgeting
- Measure agent behavior differences across context designs using controlled A/B evaluations
Curriculum
5 courses · each builds on previous goals
Curriculum
5 courses · each builds on previous goals
10 goals unlocked for preview — click to read. Locked goals need a subscription.