GenAI Application Engineering

Build production RAG & prompt chain applications, design streaming chat UIs, implement guardrails & evaluation, optimize LLM inference costs, and deploy on Kubernetes.

Preview 14 goals free

12 skill groups8 courses732 goals~342 hrs

Verifiable skill graph

12 skill groups · each becomes a signed node on your graph.

Every lab you pass signs a W3C Verifiable Credential on your public skill graph. Completing the labs in each group below mints one node on that graph — the badge you walk away with is a cryptographic record of what you can ship, not a completion certificate.

Share the URL on your résumé or with a hiring manager. They click; they see the discipline, the labs you passed, and the verification signature. No honor system, no broker.

RAG Features & Retrieval Integration

Wire retrieval into a feature: query an existing index, assemble retrieved context under a token budget, show citations, and judge answer quality in the feature loop — consuming the index the data engineer builds, not building it.

Prompt Chains & LLM Orchestration

Compose the feature's logic: multi-step prompt chains, conditional routing, map-reduce/refine, and rock-solid structured output (JSON-mode, schema-constrained generation, parse + validate + retry). Deterministic flows you author — not autonomous agent loops.

Feature Eval & Pre-Ship Quality

Is the feature good enough to ship? Offline eval sets, golden datasets, LLM-as-judge for feature quality, A/B tests, and regression suites that gate the merge — pre-ship, in the dev loop.

Per-Feature Cost & Latency

The cost/latency calls a feature author makes in app code: per-feature model selection, prompt-token reduction, response caching, retrieval-k trade-offs, and streaming for perceived latency.

Streaming & Conversation State

The backend of a chat experience: SSE/WebSocket streaming, stop/regenerate, session and history state, and conversation memory (rolling summarization, history truncation) across multi-turn dialogue.

In-App Guardrails & Content Safety

Keep the feature safe in the request path: input/output validation, prompt-injection defense, PII redaction, content moderation, grounding checks, and refusal/fallback UX.

Application Backend & API Design

The backend substrate every feature sits on: FastAPI endpoints, request/response schemas, async handling, session management, auth, and multimodal input handling.

Function Calling in App Code

Wire tools into a feature the developer controls: structured tool definitions, calling external APIs/DBs, and folding results into the response. Bounded calls you orchestrate — not autonomous agent loops.

Feature Telemetry & Trace Logging

See what your feature is doing: per-feature and per-conversation tracing, prompt/completion logging, user-feedback capture, and feature-level latency/error dashboards.

GenAI Feature Release & CI

Ship the feature safely: app build/test/deploy pipeline, eval-in-CI merge gates, prompt-as-code versioning, and feature flags for model/prompt rollout.

Hosted LLM API Integration

Baseline LLM access in app code: provider SDK calls, auth, structured-output and streaming primitives, and multimodal inputs.

Python for Application Engineering

Production Python for application code: async/await, Pydantic, typing, dataclasses, and error handling.

What you'll ship in production

Core responsibilities this discipline prepares you for.

1
Design and build production GenAI features
(chatbots, search, summarization) into web applications
- Build streaming chat UIs with FastAPI backends using SSE and WebSocket transports
- Wire React frontends to LLM-powered APIs with end-to-end full-stack integration
- Deploy complete GenAI applications from prototype to production on Kubernetes
2
Implement RAG pipelines
with vector databases for enterprise search and knowledge retrieval
- Build end-to-end RAG: document chunking → embedding generation → pgvector storage → LangGraph retrieval nodes
- Validate retrieval accuracy using RAGAS metrics and implement self-verification loops
- Benchmark chunking strategies and HNSW/IVFFlat index types against precision-recall tradeoffs
3
Optimize LLM inference
for latency, cost, and reliability across multiple providers
- Configure multi-provider routing with LiteLLM gateway including load balancing and failover
- Implement semantic caching with Redis + embedding similarity to reduce costs by 40%+
- Extract structured outputs with Pydantic AI and handle provider-specific error recovery
4
Integrate LLM APIs
(OpenAI, Gemini, Anthropic) into existing applications with error handling
- Connect to OpenAI, Anthropic, and Gemini APIs with streaming, function calling, and embeddings
- Build FastAPI rate limiting middleware with exponential backoff and retry logic
- Navigate provider contract differences across authentication, token limits, and response formats
5
Build GenAI agent features
with tool calling, function execution, and human-in-the-loop workflows
- Design LangGraph state machines with structured tool calling and JSON schema validation
- Implement MCP tool integration for dynamic tool discovery and execution
- Wire interruptible agent workflows with human approval gates and checkpoint persistence
6
Evaluate model outputs
using automated metrics and LLM-as-judge for production quality
- Build evaluation pipelines using RAGAS faithfulness/relevance metrics and DeepEval harnesses
- Integrate LLM-as-judge scoring into CI/CD gates for automated quality control
- Track quality metrics over time with Langfuse dashboards and regression detection
7
Deploy and containerize
GenAI applications on Kubernetes with CI/CD
- Containerize FastAPI + LLM applications with multi-stage Docker builds
- Deploy to Kubernetes with Helm charts, readiness probes, and Ingress configuration
- Automate rollouts with ArgoCD GitOps workflows and Kustomize environment overlays

Curriculum

8 courses · each builds on previous goals

14 goals unlocked for preview — click to read. Locked goals need a subscription.

CourseGoals

Python Essentials for Agent Builders62

Your Dev Environment4

Navigate filesystem with terminal
Manage files from command line
Set up VS Code
Configure terminal in VS Code

Python, Git & Package Management6

Install and verify Python
Write hello world script
Use Python REPL
Initialize Git repository
Track changes with Git
Install packages with pip

Variables & Basic Types5

Create and name variables
Work with strings
Work with numbers
Work with booleans
Format with f-strings

Control Flow4

Make decisions with if/elif/else
Iterate with for loops
Repeat with while loops
Control loop execution

Functions5

Define and call functions
Use parameters
Return values
Document with docstrings
Understand scope

Modules & Imports4

Import standard library
Create custom modules
Understand Python path
Create packages

Lists & Tuples5

Create and access lists
Modify lists
Slice lists
Use list comprehensions
Work with tuples

Dictionaries & Sets5

Create and access dicts
Modify dictionaries
Iterate over dicts
Work with nested dicts
Use sets

Classes & Dataclasses5

Understand class basics
Create dataclasses
Add methods
Use default values
Basic inheritance

Files, JSON & Error Handling5

Read and write files
Work with JSON
Use pathlib
Handle exceptions
Create custom exceptions

Basic Testing4

Use assert statements
Create test functions
Run pytest
Test classes

Environment Variables & Configuration5

Understand environment variables
Use .env files
Load with python-dotenv
Handle missing variables
Organize configuration

Decorators & Context Managers5

Understand decorators
Write simple decorators
Use context managers
Write context managers
Combine patterns

LLM Foundations for Agent Builders87

Generators & Iterators5

Understand iteration
Create generators
Use generator expressions
Build data pipelines
Use itertools

Async Programming Basics5

Understand async concepts
Write async functions
Run concurrent operations
Use async context managers
Handle async exceptions

Type Hints & Pydantic5

Add basic type hints
Use typing generics
Create Pydantic models
Validate API data
Configure Pydantic

Data Pipelines & Transformations5

Build functional pipelines
Work with tabular data
Transform data shapes
Process LLM data formats
Optimize for performance

HTTP Clients & httpx5

Make GET requests
Make POST requests
Use async httpx
Handle errors
Use sessions

Your First LLM Call5

Set up credentials
Install Gemini SDK
Make first API call
Parse response
Handle API errors

Tokenizer Internals5

Understand tokenization basics
Learn BPE algorithm
Compare tokenizer types
Analyze cross-language efficiency
Count and optimize tokens

Context Windows, KV-Cache & Memory6

Understand context limits
Understand KV-cache basics
Use prompt caching
Track conversation tokens
Implement truncation
Summarize for compression

LLM Architectures - Dense, MoE & KV-Cache Optimizations6

Understand transformers
Explore dense models
Understand MoE
Master KV-cache optimizations
Compare architectures
Choose for task

LLM Inference - Prefill & Decode5

Understand prefill phase
Understand decode phase
Run local models
Measure inference metrics
Optimize for latency

Sampling Parameters & Output Control5

Understand temperature
Use top-p sampling
Implement determinism
Control output length
Use structured output

Multi-Provider & Prompt Engineering5

Build provider abstraction
Structure conversations
Use few-shot prompting
Implement chain-of-thought
Build prompt templates

Function Calling Fundamentals5

Understand tool use concept
Define tool schemas
Make function calls
Handle tool responses
Compare provider patterns

Embeddings & Semantic Search5

Understand embeddings
Generate embeddings
Calculate similarity
Build simple search
Compare embedding models

RAG Fundamentals5

Understand RAG pattern
Chunk documents
Build retrieval pipeline
Compose RAG prompts
Evaluate RAG quality

Cost Awareness & Token Economics5

Understand pricing models
Calculate request costs
Compare provider costs
Identify cost drivers
Basic cost optimization

Retry Patterns with Tenacity5

Understand retry need
Use tenacity basics
Implement exponential backoff
Handle specific exceptions
Combine with async

Kubernetes Essentials for GenAI60

Containerizing LLM Applications6

Write a Python app that calls the Gemini API and returns structured responses
Write a Dockerfile and build a container image for the LLM app
Run the containerized LLM app with environment-based configuration
Use Docker Compose to run the LLM app with supporting services
Tag images with semantic versions and push to a container registry
Debug containers with exec, logs, and inspect

Your Kubernetes Cluster & First LLM Pod6

Understand K8s architecture and connect to your vCluster
Deploy the LLM app as your first Kubernetes pod
Organize workloads with namespaces
Use labels and selectors to organize and query resources
Understand pod lifecycle and restart policies
Master kubectl debugging: exec, logs, describe, port-forward

Services & the LLM Chat Backend6

Create a ClusterIP service to expose the LLM chat API internally
Deploy a multi-tier LLM chat application
Compare service types: ClusterIP, NodePort, LoadBalancer
Master DNS-based service discovery in Kubernetes
Understand endpoints and traffic routing
Debug service connectivity problems

Deployments, Scaling & Rolling Updates6

Create a Deployment for the LLM chat API
Scale LLM app replicas to handle concurrent requests
Perform a rolling update with zero downtime
Roll back a broken deployment
Compare deployment strategies: RollingUpdate vs Recreate
Manage deployment lifecycle with kubectl rollout

ConfigMaps & Secrets for LLM Apps6

Create ConfigMaps for LLM app settings
Mount ConfigMaps as files for complex configuration
Store LLM proxy credentials securely in Secrets
Manage per-environment configuration for dev, staging, and prod
Handle configuration updates and rolling restarts
Debug configuration issues in LLM app pods

Persistent Storage & StatefulSets6

Create PersistentVolumeClaims for durable storage
Deploy PostgreSQL as a StatefulSet
Connect the LLM chat API to PostgreSQL for conversation persistence
Deploy Redis as a StatefulSet for LLM response caching
Understand StatefulSet scaling and ordering guarantees
Manage PVC lifecycle: expansion, snapshots, and cleanup

Packaging with Helm & Kustomize6

Create a Helm chart for the LLM chat application
Parameterize the chart with values.yaml for each environment
Manage Helm release lifecycle: install, upgrade, rollback
Use Kustomize bases and overlays for the LLM app
Use Kustomize patches and generators
Compare Helm vs Kustomize for different deployment scenarios

Networking, Ingress & TLS6

Expose the LLM chat API via an Ingress resource
Add TLS to the Ingress for HTTPS access
Isolate services with NetworkPolicies
Configure Ingress annotations for production traffic
Understand K8s networking: pod IPs, CNI, and service routing
Debug networking and connectivity issues

Health Probes, Autoscaling & Self-Healing6

Add liveness and readiness probes to the LLM chat API
Configure startup probes for containers with slow initialization
Scale the chat API automatically with HPA based on CPU
Create PodDisruptionBudgets for safe maintenance
Implement health check patterns for LLM-dependent services
Combine autoscaling, probes, and PDBs for a resilient LLM service

RBAC, Security & K8s Troubleshooting6

Create RBAC roles for the LLM chat application
Enforce Pod Security Standards
Apply SecurityContext for defense in depth
Debug CrashLoopBackOff and OOMKilled failures
Use kubectl debug and ephemeral containers for live debugging
Troubleshoot LLM-specific issues: timeouts, proxy errors, stale connections

Web APIs for GenAI Engineers60

FastAPI Fundamentals6

Create a FastAPI application with path operations
Define Pydantic request and response models
Implement dependency injection for shared resources
Build CRUD endpoints with proper HTTP semantics
Configure OpenAPI documentation with examples
Handle errors with custom exception handlers

Async Python for APIs6

Convert sync endpoints to async with proper await patterns
Implement background tasks for non-blocking operations
Execute concurrent API calls with asyncio.gather
Manage application lifecycle with lifespan handlers
Build async generators for streaming responses
Control concurrency with semaphores and throttling

Database Integration6

Configure SQLAlchemy async engine with connection pooling
Define ORM models with relationships and constraints
Create and manage database migrations with Alembic
Implement repository pattern for data access
Build transactional endpoints with session lifecycle
Implement filtering, sorting, and full-text search

Authentication & Authorization6

Implement user registration with password hashing
Build OAuth2 password flow with JWT tokens
Implement API key authentication for services
Enforce role-based access control with permissions
Build token refresh and revocation
Compose multiple auth strategies into dependencies

Real-time Streaming6

Build SSE endpoint for streaming LLM responses
Implement WebSocket endpoint with connection lifecycle
Build WebSocket connection manager for broadcasting
Handle backpressure and slow clients
Implement heartbeat and automatic reconnection
Build real-time notification system with Redis pub/sub

Resilience Patterns6

Implement rate limiting with Redis sliding window
Build circuit breaker for LLM provider calls
Configure retry logic with tenacity
Isolate critical paths with bulkhead semaphores
Build fallback responses for degraded mode
Combine resilience patterns into middleware stack

API Gateway & Routing6

Build reverse proxy with path-based routing
Implement load balancing across backend instances
Transform requests and responses through the gateway
Aggregate responses from multiple backends
Implement service discovery with health checking
Build gateway authentication and request enrichment

Testing & Documentation6

Write async endpoint tests with httpx.AsyncClient
Build database fixtures with transaction rollback
Mock external services for deterministic tests
Implement contract tests for API consumers
Measure test coverage and set quality gates
Generate rich OpenAPI documentation with examples

API Versioning & Evolution6

Implement URL-based API versioning with routers
Build header-based version negotiation
Manage deprecation with Sunset and Warning headers
Build request and response adapters for version translation
Detect breaking changes automatically
Generate API changelogs from schema diffs

Deployment & Observability6

Build production Docker images with multi-stage builds
Deploy to Kubernetes with health check probes
Instrument endpoints with Prometheus metrics
Implement distributed tracing with OpenTelemetry
Build structured logging with correlation IDs
Create Grafana dashboards for API monitoring

GenAI Inference Engineering76

Production Hosted LLM Architecture5

Provider Landscape Analysis
Request Patterns and Streaming
Token Economics
Rate Limit Management
Secure API Management

Prompt Caching Mastery6

Anthropic Prompt Caching
OpenAI Automatic Caching
Google Context Caching
Prompt Structure Optimization
Cache Hit Monitoring
Multi-Turn Conversation Caching

Batch API Strategies6

OpenAI Batch API
Anthropic Message Batches
Google Gemini Batch Processing
Batch vs Real-Time Decisions
Combined Caching and Batching
Batch Job Orchestration

Model Routing and Selection6

Model Tier Strategy
Query Complexity Classification
Routing Frameworks
Quality-Aware Routing
Cost-Quality Optimization
A/B Testing Models

Structured Outputs6

OpenAI Structured Outputs
Anthropic Tool Use with Strict Mode
Gemini Structured Outputs
Schema Design Best Practices
Error Handling and Validation
Cross-Provider Structured Abstraction

Function Calling and Tool Orchestration6

Tool Schema Design
Multi-Tool Orchestration
Anthropic Tool Use Patterns
Parallel Tool Execution
Tool Caching Optimization
Tool Error Recovery

Reasoning Models6

OpenAI Reasoning Strategies
Anthropic Extended Thinking
Gemini Thinking Mode
Reasoning Model Router
Reasoning ROI Analysis
Reasoning Prompt Engineering

Test-Time Compute Strategies6

Chain-of-Thought Prompting
Self-Consistency Voting
Tree-of-Thought Exploration
Self-Reflection and Correction
Best-of-N with Scoring
Compute Budget Optimization

Agentic Patterns6

ReAct Pattern
Planner-Executor Pattern
Actor-Critic Pattern
Multi-Agent Orchestration
State Management
Human-in-the-Loop Patterns

Context and Memory Management6

Context Window Analysis
Selective Context Injection
Rolling Summarization
RAG vs Long Context
Dual-Layer Memory
Context Cache Optimization

Reliability and Resilience6

Rate Limit Handling
Circuit Breakers
Multi-Provider Fallback
Timeout Management
Error Classification
Health Monitoring

Cost Optimization and Observability5

Token Optimization
Combined Cost Strategies
Cost Attribution
Usage Forecasting
Quality Monitoring

Production Streaming Patterns6

SSE Event Formatting
Bounded Token Buffer
Stream Checkpoint and Resume
Stream Broadcasting
Stream Metrics Collection
End-to-End Stream Pipeline

Agent Hosted Models227

The LLM Client7

OpenAI client setup
Anthropic client setup
Google Gemini client setup
Build a unified LLM client interface
Error handling and provider fallback
Async LLM client patterns
Practical use cases — security, parameters, observability

Token Economics7

Understand tokenization
Count tokens across providers
Cost forecasting and budgeting
Track LLM API usage in production
Implement budget controls
Optimize tokens
Advanced context engineering

Prompt Caching4

Implement Anthropic cache_control
Leverage OpenAI automatic caching
Design cache-friendly prompt architectures
Build cache monitoring systems

The Function Caller7

OpenAI function schemas
Anthropic function schemas
Gemini function schemas
Handle tool call responses
Execute tools safely with Pydantic validation
Handle parallel tool calls
Framework integration with LangGraph

The Tool Definer7

Write clear tool descriptions for LLMs
Define parameter schemas
Use Pydantic for tool schemas
Implement tool decorators
Handle complex parameter types
Validate tool inputs at runtime
Framework tool patterns — LangGraph, CrewAI, OpenAI, Gemini, Anthropic

The Raw Agent Loop7

The core agent while-loop
Manage context as a mutable list
Handle stop sequences
Track iteration limits
Tool execution in the loop
Build a conversation state tracker
Build with LangGraph StateGraph

The Prompt Engineer (Dynamic)6

Master Jinja2 templating for prompts
Implement dynamic few-shot example selection
Enforce Chain-of-Thought reasoning
Structure system prompts with a builder pattern
Inject dynamic context into prompts safely
Build prompt versioning and A/B testing

The ReAct Pattern (Manual)6

Build the Thought-Action generator
Tool execution and observation injection
Complete ReAct agent implementation
Advanced ReAct patterns — validation, retry, confidence
Optimize ReAct performance
Common ReAct pitfalls and solutions

The Planner Pattern7

Plan generation
Step execution
Dynamic replanning
Hierarchical planning
Plan optimization
Monitoring and observability
Practical considerations — strategy selection

The Pydantic Tool7

Pydantic fundamentals for tool definitions
Generate JSON Schema from Pydantic models
Input validation with custom validators
Build a Pydantic tool library
Advanced Pydantic patterns
Integrate Pydantic tools with agent frameworks
Common pitfalls and solutions

The Safe Executor (Sandboxing)5

Understand code execution risks
Static code analysis
Sandboxed execution
Apply resource limits
Build a complete safe executor

The Web Navigator5

Web navigation fundamentals
Web navigation tools — locating elements and forms
Browser automation with Playwright
Session management
Complete web navigator system

The MCP Protocol (Basics)4

JSON-RPC 2.0 message format and handler
Transport mechanisms — stdio and HTTP/SSE
Protocol lifecycle — initialization, runtime, shutdown
Capability negotiation

The MCP Server6

Create an MCP server with lifecycle management
Define MCP tools
Implement MCP resources
Create prompt templates
Error handling in MCP servers
Composable MCP server architecture

The MCP Client6

MCP client architecture and stdio transport
Discover available tools and translate schemas
Proxy tool invocation
Fetch and use MCP resources
Manage MCP server lifecycle
Build multi-server MCP clients

The Tool Router5

Tool routing architecture and implementation
Namespace-based routing
Capability-based routing
Fallback chains
Routing performance optimization

Short-Term Memory8

Sliding window memory
Token-aware memory management
Message summarization strategies
Memory persistence layers
Memory retrieval optimization
Integrate memory with agents
Memory performance considerations
Non-functional requirements (privacy + safety)

Long-Term Memory (RAG)6

Document chunking strategies
Embedding pipelines
Vector database integration
Hybrid search implementation
Retrieval optimization
RAG response generation

Agentic RAG Patterns5

Self-reflective RAG
Multi-hop retrieval
Query routing
Adaptive retrieval
Retrieval feedback loops

Semantic Memory6

Knowledge extraction pipelines
Entity and relationship extraction
Knowledge graph construction
Memory consolidation
Integrate semantic memory with agents
Build semantic memory with LangGraph

Context Optimizer6

Context economics
Dynamic context prioritization
Context compression techniques
Prompt optimization
Context utilization metrics
Complete context optimizer

The State Graph5

StateGraph fundamentals — config and lifecycle
Design state schemas with TypedDict
Add nodes to StateGraph
State initialization patterns
Tracing, debugging, validation

The Conditional Edge5

Understand conditional edges
Design routing functions
Fan-out and fan-in patterns
Handle unknown routes and errors
Multi-stage routing

The Checkpointer (Time Travel)4

Resumable workflows
Inspect, replay, and time-travel
Retention, large state, and performance
Thread management — IDs and namespaces

Human-in-the-Loop6

LangGraph interrupt patterns
Approval workflow patterns
Interactive agent conversations
Feedback integration
State management for HITL
Practical use cases — escalation and analytics

The Streaming Agent6

Streaming modes in LangGraph
Token streaming from LLMs
Custom events with `astream_events`
Build streaming APIs
Error handling in streams
Backpressure and flow control

The Subgraph (Composition)7

Subgraph fundamentals — compile + test in isolation
State schema mapping
Subgraph checkpointers + namespace isolation
Compose subgraphs into a parent
Catch subgraph exceptions and recover
Define subgraph interfaces and build a registry
Build a multi-agent orchestrator

The Supervisor Pattern7

Design supervisor architectures
Worker agent specialization
Build the complete supervisor graph
Manage inter-agent communication
Handle failures and edge cases
Implement task aggregation
Build the supervisor pattern with CrewAI

The Hierarchical Pattern4

Design hierarchical agent architectures
Implement team-lead agents
Build cross-team coordination
Build the complete hierarchical graph

The Reflector Pattern (Critique)6

Design reflection architectures
Implement critic agents
Build the evaluation and convergence system
Build the complete reflection graph
Handle reflection edge cases
Practical use cases for reflection

Input Guardrails6

Design layered guardrail architectures
Format and schema validation
Build content filtering systems
Create injection / jailbreak detection
Implement policy-based guardrails
Assemble the complete guardrail system

Output Guardrails6

Design output validation architectures
Implement factual validation (hallucination detection)
Build content safety filters
Create PII redaction
Implement policy compliance
Assemble the complete output guardrail system

Prompt Injection Defense7

Identify injection vulnerabilities
Detect direct injections
Detect indirect injections
Implement defense layers
Build red-team suites
Implement canary tokens
LangGraph injection defense pipeline

Evaluations (Evals)6

Design evaluation frameworks
Implement automated evaluation pipelines
Create task-specific metrics
Human evaluation protocols
Regression testing
Set baselines and track progress

Agent Benchmarking6

Understand the GAIA benchmark
Implement ToolBench evaluation
Use AgentBench
Design domain-specific benchmarks
Cross-model performance comparison
Build benchmark dashboards

Tracing & Observability6

Understand distributed tracing
Add tags and metadata
Context propagation
Build feedback collection
Integrate with Langfuse
Trace visualization

Tool Use Debugging6

Tool selection failures and solutions
Argument validation systems
Build tool use dashboards and visualization
Schema mismatch detection
Tool call replay
Interactive tool debugger

Serving Agents (FastAPI)7

Async endpoints, request validation, error handling
Server-Sent Events (SSE) streaming
Background tasks
Design request and response schemas
Authentication — API keys, middleware, errors
OpenAPI metadata and documentation
FastAPI + LangGraph + uvicorn deployment

GenAI Operations24

GenAI CI/CD Pipelines6

Build Argo Workflow templates for GenAI artifact CI/CD
Implement pipeline stages: lint, validate, eval, promote
Create artifact-specific pipelines for prompts, models, and RAG configs
Monitor pipeline health with observability metrics
Implement performance optimization for ci/cd pipelines for genai artifacts
Build operational documentation for ci/cd pipelines for genai artifacts

Eval Gate Pipeline6

Build Promptfoo Eval Suites for Pre-Promotion Quality Verification
Implement Eval Gates in Argo Workflows That Block Promotion on Failure
Create Golden Test Sets for Regression Detection
Track Eval Pass Rates and Gate Effectiveness Metrics
Implement performance optimization for automated eval gates
Build operational documentation for automated eval gates

Distributed LLM Tracer6

Deploy an OpenTelemetry Collector with Langfuse Exporter
Instrument Multi-Provider Request Chains with Parent-Child Trace Spans
Build Trace Correlation Across RAG Retrieval, LLM Inference, and Guardrail Processing
Create Trace-Based Latency Analysis Dashboards with Drill-Down Capability
Implement performance optimization for end-to-end llm tracing
Build operational documentation for end-to-end llm tracing

GenAI Alert System6

Configure Alertmanager with GenAI-Specific Routing Rules and Severity Classification
Deploy Grafana OnCall for On-Call Schedules, Escalation Policies, and Incident Lifecycle
Implement Alert Deduplication and Grouping for Noisy GenAI Metrics
Build Alert Effectiveness Tracking to Reduce Alert Fatigue
Implement performance optimization for alerting strategy
Build operational documentation for alerting strategy

Full Stack GenAI Applications90

Chat Completion API with Streaming5

Build a FastAPI SSE streaming response endpoint
Implement an OpenAI GPT-4o streaming adapter
Implement a Gemini 2.5 Flash streaming adapter with thinking budget
Implement an Anthropic Claude streaming adapter
Build a Llama 4 Maverick streaming adapter via Together.ai

Multi-Provider LLM Gateway with LiteLLM5

Build an LLM gateway with LiteLLM acompletion
Extract structured output with Instructor + Pydantic
Build a compound AI model router by request complexity
Configure LiteLLM Router with fallback and health monitoring
Build a usage logging system with token + cost capture

Context Engineering & Conversation Memory5

Build a conversation CRUD API with PostgreSQL
Integrate Mem0 as a persistent user-scoped memory layer
Build a context window composer with token budgets
Implement Anthropic prompt caching with cache_control markers
Build a Redis session cache for active conversation context

Message Processing Pipeline5

Build a structured content extractor with Instructor
Build a code block processor with language detection
Build a code validator with Gemini ToolCodeExecution
Build a citation processor with URL validation
Build a streaming content assembler emitting typed blocks

Feedback, Regeneration & Edit APIs5

Build a feedback collection API
Build a Pydantic AI regeneration agent with typed tools
Build an edit-and-resubmit handler with branch pruning
Build an A/B model comparison endpoint
Build feedback analytics with materialized views

File Upload & Document Processing5

Build a multipart file upload endpoint with type/size validation
Build a document parser with Unstructured.io
Build a web ingestion service with Crawl4AI
Build a document chunking pipeline (recursive, semantic, token-aware)
Build a context injection service with relevance ranking

Multi-Modal Input/Output APIs5

Build a vision analysis API with GPT-4o + Gemini concurrently
Build a Gemini grounded-generation endpoint with Google Search
Build an audio transcription endpoint with Whisper
Build content safety middleware with Llama Guard 4
Build a multi-modal response assembler with unified schema

MCP, Tool Execution & Agentic Backends5

Build an MCP server exposing business logic as tools
Build an MCP client in FastAPI
Build a Pydantic AI agent with typed tools and DI
Build an agentic loop executor with SSE-streamed steps
Build a Google ADK agent with MCP + multi-agent delegation

Real-Time Collaboration Backend5

Build a WebSocket connection manager with JWT auth
Build shared conversation sessions for multi-user AI
Build presence tracking with Redis sorted sets
Build a concurrent message handler with queuing
Build an event broadcast system with Redis pub/sub

Authentication, Safety & Guardrails5

Build JWT auth with refresh-token rotation and Redis sessions
Build conversation ownership + RBAC sharing
Configure NeMo Guardrails with Colang 2.0
Build Llama Guard 4 content classifier
Build LlamaFirewall jailbreak detection with PromptGuard 2

Data Modeling for AI Applications5

Design User/Org/Conversation/Message models with SQLAlchemy 2.0 async
Build polymorphic file/attachment + embedding models
Build usage tracking with monthly table partitioning
Build async Alembic migration pipeline
Build async repository pattern with unit-of-work

API Gateway with Rate Limiting & Guardrails5

Build request validation middleware
Build Redis-backed token-bucket rate limiter
Build content-addressable response cache
Build gateway-level guardrails with audit logging
Build K8s liveness/readiness probes with dependency monitoring

Hybrid RAG Backend with Vector Search5

Build a RAG document ingestion pipeline (Crawl4AI + Unstructured)
Build hybrid retrieval (semantic + BM25 + reranking)
Orchestrate RAG with LlamaIndex Workflows
Build an agentic RAG agent with Pydantic AI
Evaluate RAG quality with RAGAS metrics

Cost Tracking, Caching & Budget Enforcement5

Build token counting middleware with tiktoken
Maximize prompt cache hits and track caching savings
Build a semantic cache with Redis + embedding similarity
Build budget enforcement with alerts + automatic model downgrade
Build cost analytics with materialized views

Testing & Evaluation for GenAI APIs5

Build a streaming SSE test harness across 4 LLM providers
Configure a Promptfoo CI/CD evaluation pipeline
Build a RAGAS evaluation suite
Build DeepEval benchmark suites with GEval/LLM-as-judge
Execute security red-teaming campaigns

Observability with Langfuse & OpenTelemetry5

Instrument LiteLLM calls with Langfuse traces
Build OpenTelemetry distributed trace pipelines
Manage prompt template versions with Langfuse
Correlate user feedback with Langfuse traces
Use Pydantic AI + Logfire as an alternative observability stack

Performance Optimization & Load Testing5

Architect prompt cache strategies for Anthropic + OpenAI
Compile optimized prompts with DSPy
Build async connection pools with FastAPI lifespan
Build batch request processors with backpressure
Build Locust load tests for GenAI APIs

Production Deployment on Cloud Run & GKE5

Build multi-stage Docker images for FastAPI AI apps
Deploy FastAPI to Cloud Run with auto-scaling
Configure GKE deployments with HPA on custom metrics
Deploy NVIDIA NIM for self-hosted Llama 4 with LiteLLM routing
Deploy MCP tool servers as sidecars with external-secrets-operator

GenAI Application Engineering

Verifiable skill graph

What you'll ship in production

Design and build production GenAI features

Implement RAG pipelines

Optimize LLM inference

Integrate LLM APIs

Build GenAI agent features

Evaluate model outputs

Deploy and containerize

Curriculum