Chapter 10

GenAI CI/CD Pipelines

Argo WorkflowsCDpipeline templatesartifact validationpromotion pipelinespipeline observability

Learning Path

Hands-on Labs

Each objective has a coding lab that opens in VS Code in your browser

Objective 1

Build prompt CI/CD pipeline

Goal

You will build an Argo Workflow pipeline for prompt CI/CD. Define a `WorkflowTemplate` with stages: `lint` (validate prompt YAML/JSON syntax, check required fields), `validate` (verify prompt references valid model IDs, check token count within limits), `eval` (run Promptfoo evaluation against golden test set, fail if quality drops below threshold), `promote` (update prompt ConfigMap in target environment via Git commit). Implement the pipeline as a DAG with parallel lint+validate, then sequential eval, then promote. Configure Argo Events to trigger the pipeline on Git push to `prompts/` directory. Store pipeline artifacts (eval results, diff reports) in MinIO/S3. Build a `PromptPipelineConfig` Pydantic model that defines per-prompt eval thresholds and promotion gates. Track pipeline metrics: `prompt_pipeline_duration_seconds`, `prompt_pipeline_result{stage,result}`.

Objective 2

Build model config and RAG config pipelines

Goal

You will build CI/CD pipelines for model configuration and RAG configuration changes. Model config pipeline: `validate` (verify model ID exists in provider, check pricing is current), `shadow-test` (deploy model config to shadow environment, send production traffic copy, compare quality and cost), `promote` (update LiteLLM config via Git). RAG config pipeline: `validate` (verify embedding model is available, check chunking parameters are valid), `index-test` (build test index with new config, run RAGAS evaluation comparing old vs new retrieval quality), `promote` (update RAG config and trigger reindexing workflow). Implement rollback triggers: if quality metrics degrade within 1 hour of promotion, automatically revert the config change. Store all pipeline results for audit trail.

Objective 3

Implement pipeline observability

Goal

You will build observability for all GenAI CI/CD pipelines. Instrument Argo Workflow metrics: scrape Argo's built-in Prometheus endpoint for workflow duration, step duration, success/failure rates. Build custom metrics: `pipeline_promotion_velocity{artifact_type}` (time from commit to production), `pipeline_gate_block_rate{gate}` (how often each gate blocks promotion), `pipeline_rollback_rate{artifact_type}` (how often promotions get rolled back). Create a Grafana pipeline health dashboard: pipeline execution timeline, success rate per artifact type, average promotion velocity trend, and gate effectiveness (block rate vs false positive rate). Implement pipeline SLOs: 95% of prompt changes reach production within 30 minutes, 99% of pipeline runs complete without infrastructure failure. Track pipeline SLO compliance.

Objective 4

Build testing and validation for ci/cd pipelines for genai artifacts

Goal

You will build comprehensive testing and validation for the ci/cd pipelines for genai artifacts system. Implement `CI/CDPipelinesforGenAIArtifactsTester`: define test scenarios that verify all critical paths work correctly under normal conditions, edge cases, and failure conditions. Build integration tests that verify the system integrates correctly with upstream and downstream components. Implement regression testing: maintain a test suite that runs on every configuration change to catch regressions. Build `POST /api/v1/ci/cd-pipelines-for-genai-artifacts/test` API that triggers the full test suite and returns results. Run tests as scheduled Argo Workflow CronJobs. Track `test_pass_rate_{system}_total`, `test_duration_seconds`. Build test results dashboard showing pass rates, flaky tests, and coverage.

Objective 5

Implement performance optimization for ci/cd pipelines for genai artifacts

Goal

You will build performance monitoring and optimization for the ci/cd pipelines for genai artifacts system. Implement `CI/CDPipelinesforGenAIArtifactsOptimizer`: instrument all critical paths with latency histograms, identify bottlenecks using p95/p99 analysis, and implement optimizations. Build capacity analysis: measure maximum throughput under load, identify scaling limits, and document capacity thresholds. Implement performance SLOs: define acceptable latency and throughput targets, track compliance, and alert on degradation. Build performance benchmarking: run standardized benchmarks on every significant change to detect performance regressions. Track `performance_benchmark_result_{system}`, `performance_slo_compliance_{system}`. Create performance dashboard with trend analysis.

Objective 6

Build operational documentation for ci/cd pipelines for genai artifacts

Goal

You will build comprehensive operational documentation and runbooks for the ci/cd pipelines for genai artifacts system. Implement `CI/CDPipelinesforGenAIArtifactsDocGenerator`: auto-generate architecture diagrams from deployed resources, configuration reference from active configs, and API documentation from FastAPI OpenAPI specs. Build operational runbooks: document common operational tasks (scaling, configuration changes, troubleshooting), emergency procedures (failure recovery, rollback), and maintenance procedures (upgrades, data migrations). Implement documentation freshness: track when documentation was last updated vs when the system was last changed, flag stale docs. Store documentation in Git with version tracking. Build `GET /api/v1/ci/cd-pipelines-for-genai-artifacts/docs` serving current documentation. Track `documentation_freshness_{system}`, `documentation_coverage_{system}`.