GenAI Safety & Evaluation Engineering

Design automated LLM evaluation pipelines, red-team GenAI systems, build bias detection and fairness benchmarks, implement guardrails.

12 skill groups7 courses702 goals~306 hrs

Verifiable skill graph

12 skill groups · each becomes a signed node on your graph.

Every lab you pass signs a W3C Verifiable Credential on your public skill graph. Completing the labs in each group below mints one node on that graph — the badge you walk away with is a cryptographic record of what you can ship, not a completion certificate.

Share the URL on your résumé or with a hiring manager. They click; they see the discipline, the labs you passed, and the verification signature. No honor system, no broker.

01
Python for Safety & Evaluation

Production-grade Python for eval tooling: async/await, Pydantic models for eval rubrics, typing, dataclasses, pytest harnesses, parametrized testing patterns.

02
Hosted LLM API Integration

Provider SDK integration in eval and safety code: judge models, multi-provider scoring, cross-model evaluation harnesses, multi-provider abstraction (LiteLLM).

03
Eval Dataset Design & Curation

Golden-set curation, dataset versioning, prompt-variant generation, edge-case mining, stratified sampling, dataset bias auditing, eval-set hygiene.

04
LLM-as-Judge & Scoring

LLM-as-judge rubric design, position-bias correction, calibration against human raters, scoring functions, judge reproducibility, multi-judge ensembling.

05
RAG & Cross-Model Evaluation

RAGAS + DeepEval + TruLens pipelines, retrieval relevance + faithfulness + answer-relevancy metrics, cross-model comparison harnesses, model ranking.

06
Agent Trajectory Evaluation

Step-level agent eval, trajectory scoring, tool-call accuracy, plan-quality assessment, human-in-the-loop review gates, golden-trajectory datasets.

07
Continuous Eval & CI/CD Gates

A/B testing for LLM systems, eval-driven CI/CD pipelines, quality gates in deployment, regression suites, prompt-variant champion/challenger.

08
Hallucination & Bias Detection

Hallucination detectors (grounding checks, NLI-based), bias and fairness metrics, demographic-parity tests, disparate-impact analysis, continuous fairness monitoring.

09
Content Safety & PII Defense

Content moderation, PII detection + redaction, toxicity classifiers, sensitive-data filters, output sanitization, regulated-content classification.

10
Adversarial Robustness & Red Teaming

Prompt injection defense, jailbreak detection, adversarial robustness testing, automated red teaming, OWASP LLM Top 10 + MITRE ATLAS, attack scenario engineering.

11
Compliance & Governance Frameworks

EU AI Act compliance, SOC2/HIPAA/GDPR frameworks, regulatory artifact generation, AI risk classification, governance committee workflows, end-to-end eval+safety+governance pipelines.

12
Eval Observability & Cost Governance

Langfuse + OpenTelemetry for eval traces, eval-pipeline dashboards, cost governance for eval runs, token budget controllers for judge models, eval-cost FinOps.

What you'll ship in production

Core responsibilities this discipline prepares you for.

  1. 1

    Build automated evaluation pipelines

    to continuously measure LLM output quality

    • Design evaluation harnesses with RAGAS, DeepEval, and NeMo Evaluator SDK for multi-metric scoring
    • Create evaluation datasets with ground-truth annotations and run cross-provider comparisons
    • Wire CI gates that automatically block deployments when faithfulness or relevance scores degrade
  2. 2

    Conduct red-team exercises

    — probe LLMs for vulnerabilities

    • Automate adversarial testing with Garak for prompt injection, jailbreak, and data extraction probes
    • Run multi-turn adversarial campaigns with Meta GOAT and DeepTeam for agent vulnerability testing
    • Execute red-team campaigns against realistic systems, discover vulnerabilities, and write actionable findings
  3. 3

    Implement production guardrails

    — content filters, PII detection, jailbreak prevention

    • Configure NeMo Guardrails with Colang policy language, Llama Guard 4, and Prompt Guard 2
    • Add Presidio for PII detection/redaction and Model Armor for Google-native content safety
    • Layer multiple defenses, test against comprehensive attack suites, and quantify safety-vs-helpfulness tradeoffs
  4. 4

    Design GenAI governance frameworks

    aligned with regulations

    • Map EU AI Act risk classification and implement NIST AI RMF control frameworks
    • Build OWASP LLM Top 10 mitigation strategies mapped to technical controls
    • Create governance artifacts, conduct risk assessments, and build automated audit trail pipelines
  5. 5

    Evaluate GenAI agent behavior

    — trajectory quality, tool selection accuracy

    • Build trajectory scoring systems measuring tool selection accuracy and task completion quality
    • Design human preference alignment tests and regression test suites for agent workflows
    • Evaluate multi-step agent executions to identify failure modes and build targeted regression tests
  6. 6

    Monitor bias, fairness, and hallucination rates

    in production

    • Detect bias across protected attributes using statistical fairness metrics and disparity analysis
    • Measure hallucination rates through ground-truth comparison and citation verification
    • Implement continuous bias scanning, hallucination detection, and alerting for metric drift
  7. 7

    Build safety incident response processes

    for deployed GenAI systems

    • Design safety monitoring dashboards with severity-based alert routing and escalation paths
    • Build incident triage workflows with containment procedures and post-incident reporting templates
    • Simulate safety incidents end-to-end and practice the full detection-to-resolution workflow
  8. 8

    Design LlamaFirewall policies

    for agent safety

    • Configure LlamaFirewall middleware for controlling agent tool access and output filtering rules
    • Set up multi-agent safety boundaries with policy-based execution constraints
    • Validate firewall policies against adversarial scenarios where agents attempt to bypass controls

Curriculum

7 courses · each builds on previous goals

11 goals unlocked for preview — click to read. Locked goals need a subscription.