LLMOps Engineering
Monitor hallucination rates and token costs, operate guardrails and eval gates, manage prompt versioning and canary deployments.
Verifiable skill graph
12 skill groups · each becomes a signed node on your graph.
Verifiable skill graph
12 skill groups · each becomes a signed node on your graph.
Every lab you pass signs a W3C Verifiable Credential on your public skill graph. Completing the labs in each group below mints one node on that graph — the badge you walk away with is a cryptographic record of what you can ship, not a completion certificate.
Share the URL on your résumé or with a hiring manager. They click; they see the discipline, the labs you passed, and the verification signature. No honor system, no broker.
Is the answer good? Hallucination/groundedness detection, eval-gate pipelines that block bad deploys, quality-drift detection, RAGAS/DeepEval, LLM-as-judge (and validating the judge), and RAG retrieval-quality monitoring.
Keep it cheap: per-request cost attribution, token-budget controllers and gates, cache economics, model-tier arbitrage, batch scheduling, and capacity forecasting — driving spend down (per-tenant chargeback belongs to the platform engineer).
Keep it not-on-fire: incident command, triage and post-mortems, runbook automation, on-call and alerting hygiene, error-budget policy, and provider-outage failover/fallback drills.
Keep it compliant: audit engines and trails, EU AI Act / SOC2 / HIPAA / GDPR controls, policy gates, bias and fairness monitoring, and regulatory reporting.
Safely change what's in prod: prompt/model versioning and registry, provenance/lineage, canary analysis, automated rollback triggers, progressive delivery, feature flags, and shadow/replay testing.
Is the service up, fast, and within SLO? OpenTelemetry tracing, Langfuse, Prometheus/Grafana, latency/throughput/error telemetry, and SLI/SLO engines.
Operate the safety layer in production: runtime guardrails with canary/kill-switch, key rotation, PII detection, prompt-injection monitoring, content-safety filters, and red-team automation.
Ship safely and repeatably: GitHub Actions/ArgoCD GitOps, Infrastructure-as-Code, secrets management, environment promotion, and deployment automation.
Operate the serving substrate: Helm, multi-tenant isolation and quota, HPA, gateway/guardrail sidecars, and namespace policy.
Keep the retrieval substrate up in production: health checks, failure detection, recovery, and capacity/staleness alarms — operating it, not building it.
Baseline LLM access in ops code: provider SDKs, multi-provider gateways, smart routers, and provider failover.
Production Python for ops tooling: async/await, Pydantic, typing, dataclasses, pytest, and error handling.
What you'll ship in production
Core responsibilities this discipline prepares you for.
What you'll ship in production
Core responsibilities this discipline prepares you for.
- 1
Design CI/CD pipelines
for LLM application deployment
- Build ArgoCD GitOps workflows with Helm-based deployments and environment promotion
- Implement canary and blue-green rollout strategies with automated quality-based rollback
- Wire complete CI/CD pipelines that trigger rollbacks when evaluation metrics degrade
- 2
Monitor LLM systems in production
— latency, errors, costs, quality
- Instrument with OpenTelemetry and Langfuse v3 for OTEL-native distributed tracing
- Build Grafana dashboards with Logfire for Python application monitoring and alerting
- Set up monitoring stacks that detect anomalies, fire alerts, and enable trace-based root cause analysis
- 3
Manage LLM gateway operations
— key rotation, failover, quota management
- Operate LiteLLM gateway: API key lifecycle management, provider health monitoring, per-team quotas
- Handle zero-downtime model version switching with traffic draining and validation
- Simulate provider outages and quota exhaustion to validate failover and degradation behavior
- 4
Implement FinOps practices
— cost attribution, budgets, and optimization
- Track token costs by team, feature, and model with Prometheus-based budget alerting
- Implement cost optimization through semantic caching, model tiering, and prompt compression
- Build FinOps dashboards that demonstrate measurable cost reduction across optimization strategies
- 5
Build continuous evaluation pipelines
for production LLM quality
- Run RAGAS and DeepEval evaluation pipelines alongside production traffic as shadow evaluators
- Set up Langfuse-based quality tracking with automated quality gates and threshold alerting
- Detect quality degradation in real time and trigger automated alerts when scores drop below baselines
- 6
Detect and respond to prompt attacks
and safety incidents in production
- Monitor NeMo Guardrails operationally for prompt injection and jailbreak detection patterns
- Classify incident severity and execute structured response workflows with containment procedures
- Simulate attack scenarios end-to-end: detection, triage, remediation, and post-incident analysis
- 7
Manage data quality for RAG systems
— freshness, drift, accuracy
- Monitor embedding drift and retrieval accuracy with continuous RAGAS evaluation
- Set up automated reindexing triggers and stale content detection pipelines
- Build monitoring for live RAG systems that detects quality degradation and triggers reindexing workflows
- 8
Implement capacity planning
— predict demand and right-size deployments
- Forecast token demand using historical usage patterns and run load tests for LLM services
- Model SLA capacity requirements and configure KEDA-based autoscaling policies
- Run load tests that predict capacity requirements and validate SLA compliance under variable traffic
Curriculum
8 courses · each builds on previous goals
Curriculum
8 courses · each builds on previous goals
14 goals unlocked for preview — click to read. Locked goals need a subscription.