AI Developer Platform Engineering
Master building internal AI developer platforms. Covers platform vision, platform APIs, self-service portals, LLM gateway management, model registries, multi-tenant architecture, RBAC, resource quotas, cost allocation, onboarding automation, agent runtimes, tool registries, vector DB as a service, evaluation platforms, prompt workspaces, platform monitoring, compliance, change management, analytics, and a platform capstone. All labs run in K8s pods.
Learning Path
8 phases • 20 chaptersFoundations
Python essentials and development environment for agent development
Tools & Topics
Virtual environments, async programming, type hints, Pydantic, error handling, testing, debugging, logging, project structure
Goals
- •Set up professional development environments
- •Write async Python code fluently
- •Use type hints and Pydantic for robust data handling
- •Implement error handling, testing, logging, and debugging
Chapters
LLM Fundamentals
Core LLM concepts: API clients, token economics, caching, and function calling basics
Tools & Topics
LLM APIs, OpenAI/Anthropic/Gemini clients, prompt caching, token economics, function calling basics
Goals
- •Call multiple LLM providers (OpenAI, Anthropic, Gemini)
- •Implement prompt caching and token cost management
- •Build function calling and tool definitions
- •Understand token economics and cost optimization
Chapters
Agent Fundamentals
Agent patterns: ReAct, planning, tool execution, sandboxing, web navigation, and MCP protocol
Tools & Topics
ReAct loop, planning patterns, tool execution, sandboxing, web navigation, MCP servers, MCP clients, tool routing
Goals
- •Create agent loops with ReAct and planning patterns
- •Build and consume MCP servers for tool integration
- •Implement sandboxing and web navigation
- •Design structured outputs and prompts
Chapters
Agent State & Memory
Memory systems, RAG patterns, context optimization, and LangGraph state machines
Tools & Topics
Short-term memory, long-term memory (RAG), agentic RAG patterns, semantic memory, context optimization, state graphs, conditional edges, checkpointing, human-in-the-loop, streaming, subgraphs
Goals
- •Implement short-term and long-term memory
- •Build RAG and agentic RAG systems
- •Create state machines with LangGraph
- •Implement checkpointing, streaming, and human-in-the-loop
Chapters
Multi-Agent Systems
Multi-agent patterns, guardrails, evaluations, and observability
Tools & Topics
Supervisor pattern, hierarchical pattern, reflector pattern, input guardrails, output guardrails, prompt injection defense, evaluations, benchmarking, tracing, observability
Goals
- •Implement supervisor, hierarchical, and reflector patterns
- •Build input and output guardrails
- •Defend against prompt injection attacks
- •Evaluate agents with benchmarks
Chapters
Production & Operations
Production deployment: APIs, containers, databases, scaling, CI/CD, and monitoring
Tools & Topics
FastAPI, Docker, production databases, scaling, CI/CD, monitoring, alerting, model routing, fallbacks, system design
Goals
- •Serve agents via FastAPI with Docker
- •Deploy to Kubernetes with CI/CD
- •Monitor with Prometheus/Grafana
- •Build multi-tenant agent platforms
Chapters
Advanced Topics
Alternative frameworks, protocols, specialized agents, autonomous workflows, and cutting-edge capabilities
Tools & Topics
CrewAI/AutoGen, A2A protocols, GraphRAG, local models, vision agents, voice agents, code agents, autonomous workflows, streaming data, agent swarms
Goals
- •Use alternative frameworks (CrewAI, AutoGen)
- •Implement A2A protocol for agent communication
- •Build GraphRAG for complex knowledge
- •Build vision, computer use, and voice agents
Chapters
Agent Production Excellence
Production excellence: trajectory evaluation, safety, cost control, enterprise patterns, and governance
Tools & Topics
Agent trajectory evaluation, safety boundaries, cost control, enterprise agent patterns, load testing, versioning, fleet dashboards, autonomous agent governance
Goals
- •Score multi-step agent reasoning with LLM-as-judge pipelines
- •Build safety boundaries with permissions and kill switches
- •Implement per-agent cost budgets and cost-aware routing
- •Deploy enterprise agent patterns for document processing and code review