Free lesson

Build Argo Workflow templates for GenAI artifact CI/CD

You will build an Argo Workflow pipeline for prompt CI/CD. Define a `WorkflowTemplate` with stages: `lint` (validate prompt YAML/JSON syntax, check required fields), `validate` (verify prompt references valid model IDs, check token count within limits), `eval` (run Promptfoo evaluation against golden test set, fail if quality drops below threshold), `promote` (update prompt ConfigMap in target environment via Git commit). Implement the pipeline as a DAG with parallel lint+validate, then sequential eval, then promote. Configure Argo Events to trigger the pipeline on Git push to `prompts/` directory. Store pipeline artifacts (eval results, diff reports) in MinIO/S3. Build a `PromptPipelineConfig` Pydantic model that defines per-prompt eval thresholds and promotion gates. Track pipeline metrics: `prompt_pipeline_duration_seconds`, `prompt_pipeline_result{stage,result}`.

~25 min read · Free to read — no subscription required.

Build Argo Workflow templates for GenAI artifact CI/CD

Introduction

When you ship prompts, model configs, RAG policies, or guardrail rules through the same pipeline you use for binaries, you discover the hard way that "build success" means nothing for an artifact whose correctness is semantic. A prompt that lints clean can still hallucinate; a RAG config that loads can still retrieve garbage. Teams that skip artifact-aware CI catch these regressions in production, after the merge button is pressed and after a customer notices. By the end of this lesson you'll be able to author reusable Argo WorkflowTemplate resources whose DAG stages — lint, validate, eval, promote — are parameterized per artifact type and generated programmatically from Python, so a single template skeleton drives CI for every GenAI artifact in your repo.

Key Terminology

WorkflowTemplate — an Argo custom resource that defines a reusable, parameterized pipeline skeleton; in this lesson it is the unit you build, version in Git, and reuse across every artifact type.
DAG template — Argo's template kind that lays tasks out as a directed acyclic graph with explicit dependency edges; it is what lets the eval stage fan out across datasets while lint and validate stay sequential.
Fan-out / fan-in — the pipeline shape where one upstream task triggers many parallel downstream tasks that all rejoin at a single barrier; it is how a prompt is scored against several eval datasets in parallel without serializing wall-clock.
Parameter substitution — Argo's {{workflow.parameters.<name>}} syntax for injecting runtime values into container commands; it is the mechanism that makes one template serve many artifact instances.
Quality gate — the entry condition on the promote stage that blocks publication when aggregate eval metrics fall below threshold; it is the failure mode you are trying to make impossible to bypass.

Concepts

Why Argo for GenAI artifact CI/CD

Argo runs workflows as Kubernetes-native custom resources, so each pipeline step gets its own container with its own image, env, and resource budget. That isolation is what makes Argo a fit for GenAI artifacts: the eval stage often needs GPU access or network calls to an inference endpoint, while lint can run in a 50 MB python image. Because Workflow and WorkflowTemplate are CRDs, you version them in Git next to the artifacts they validate, apply them with kubectl, and observe them through the Kubernetes API — the same surface you already use for everything else on the cluster.

The WorkflowTemplate resource is the reusable skeleton. It declares parameters, defines per-stage container templates, and exposes an entrypoint. Submitting a Workflow against a template just supplies parameter values, so a single template serves prompt pipelines, model-config pipelines, and RAG-config pipelines by switching the validation schema and the eval container per call (see Code Walkthrough).

The four-stage DAG: lint, validate, eval, promote

Every GenAI artifact pipeline decomposes into four canonical stages. Lint checks structure — schema conformance, required fields, syntax. Validate runs semantic checks against reference data. Eval invokes model inference to score artifact quality, typically across several datasets in parallel. Promote publishes the validated artifact to the registry. These stages form a DAG, not a chain, because eval fans out across datasets while still depending on validate.

Loading diagram...

The fan-out reduces wall-clock by scoring against several eval datasets simultaneously; the fan-in keeps promote gated on every eval branch passing. Argo handles dependency resolution, retries, and artifact passing between stages from the DAG declaration alone.

Template reuse over template proliferation

A platform with 50 prompts, 10 model configs, and 5 RAG policies should not maintain 65 pipeline definitions. Define one WorkflowTemplate per artifact type (four total), parameterize the validation schema, eval datasets, and promotion targets, and let the CI trigger inject the right values at submission time. Tag every template with an artifact-type label so dashboards can filter Argo workflow metrics by category and operators can list "all running prompt pipelines" with a single selector. This label-driven reuse is what makes the builder pattern later in this lesson worth the abstraction tax (see Code Walkthrough).

Code Walkthrough

The two snippets below show the concepts in motion: a WorkflowTemplateBuilder that emits a valid WorkflowTemplate dictionary, and a build_prompt_pipeline function that uses it to construct the four-stage lint → validate → eval fan-out → promote DAG for prompt artifacts.

Code snippetpython
1from enum import Enum
2from typing import Optional
3
4class GenAIArtifactType(Enum):
5    PROMPT = "prompt"
6    MODEL_CONFIG = "model-config"
7    RAG_CONFIG = "rag-config"
8    GUARDRAIL = "guardrail-policy"
9
10class WorkflowTemplateBuilder:
11    def __init__(self, name: str, artifact_type: GenAIArtifactType):
12        self.name = name
13        self.artifact_type = artifact_type
14        self.stages: list[dict] = []
15        self.parameters: list[dict] = [
16            {"name": "artifact-path", "value": ""},
17            {"name": "artifact-type", "value": artifact_type.value},
18            {"name": "git-revision", "value": "main"},
19        ]
20
21    def add_stage(self, stage_name: str, image: str,
22                  command: list[str], dependencies: Optional[list[str]] = None,
23                  resources: Optional[dict] = None) -> "WorkflowTemplateBuilder":
24        task = {
25            "name": stage_name,
26            "template": stage_name,
27            "dependencies": dependencies or [],
28        }
29        self.stages.append({"task": task, "image": image,
30                            "command": command, "resources": resources})
31        return self
32
33    def build(self) -> dict:
34        dag_tasks = [s["task"] for s in self.stages]
35        templates = [self._make_container_template(s) for s in self.stages]
36        templates.append({"name": "pipeline", "dag": {"tasks": dag_tasks}})
37        return {
38            "apiVersion": "argoproj.io/v1alpha1",
39            "kind": "WorkflowTemplate",
40            "metadata": {"name": self.name,
41                         "labels": {"artifact-type": self.artifact_type.value}},
42            "spec": {
43                "entrypoint": "pipeline",
44                "arguments": {"parameters": self.parameters},
45                "templates": templates,
46            },
47        }
48
49    def _make_container_template(self, stage: dict) -> dict:
50        template = {
51            "name": stage["task"]["name"],
52            "container": {"image": stage["image"], "command": stage["command"]},
53        }
54        if stage.get("resources"):
55            template["container"]["resources"] = stage["resources"]
56        return template

GenAIArtifactType constrains callers to the four canonical categories, so a typo fails at construction rather than producing a broken template Kubernetes happily accepts. __init__ seeds the three parameters every GenAI pipeline needs (artifact-path, artifact-type, git-revision). add_stage appends one DAG task plus its container spec — the dependencies list is what wires the DAG together — and returns self for fluent chaining. build walks the staged data once to produce DAG tasks and container templates, then wraps everything in the Argo CRD shell.

Code snippetpython
1import yaml
2
3def build_prompt_pipeline(registry: str, eval_datasets: list[str]) -> dict:
4    builder = WorkflowTemplateBuilder(
5        name="prompt-cicd-pipeline",
6        artifact_type=GenAIArtifactType.PROMPT,
7    )
8    builder.add_stage(
9        stage_name="lint",
10        image=f"{registry}/genai-lint:latest",
11        command=["python", "-m", "genai_lint", "--type", "prompt",
12                 "--path", "{{workflow.parameters.artifact-path}}"],
13    )
14    builder.add_stage(
15        stage_name="validate",
16        image=f"{registry}/genai-validate:latest",
17        command=["python", "-m", "genai_validate", "--schema", "prompt-v2",
18                 "--path", "{{workflow.parameters.artifact-path}}"],
19        dependencies=["lint"],
20    )
21    for idx, dataset in enumerate(eval_datasets):
22        builder.add_stage(
23            stage_name=f"eval-{idx}",
24            image=f"{registry}/genai-eval:latest",
25            command=["python", "-m", "genai_eval", "--dataset", dataset,
26                     "--path", "{{workflow.parameters.artifact-path}}"],
27            dependencies=["validate"],
28            resources={"requests": {"cpu": "2", "memory": "4Gi"}},
29        )
30    eval_deps = [f"eval-{i}" for i in range(len(eval_datasets))]
31    builder.add_stage(
32        stage_name="promote",
33        image=f"{registry}/genai-promote:latest",
34        command=["python", "-m", "genai_promote", "--type", "prompt",
35                 "--path", "{{workflow.parameters.artifact-path}}",
36                 "--revision", "{{workflow.parameters.git-revision}}"],
37        dependencies=eval_deps,
38    )
39    return builder.build()
40
41def export_to_yaml(spec: dict, output_path: str) -> None:
42    with open(output_path, "w") as f:
43        yaml.dump(spec, f, default_flow_style=False, sort_keys=False)

build_prompt_pipeline chains the four stages, using one container image per stage and Argo's {{workflow.parameters.artifact-path}} substitution so the actual path is injected at submission time. The for loop produces one parallel eval-N task per dataset, each declaring validate as its only dependency — that is the fan-out. The promote stage collects every eval-N name into its own dependency list, producing the fan-in barrier that gates publication on every eval branch passing.

You'll know it works when kubectl apply -f on the dumped YAML lands a WorkflowTemplate/prompt-cicd-pipeline resource on the cluster, and argo submit --from workflowtemplate/prompt-cicd-pipeline -p artifact-path=prompts/foo.yaml produces a workflow whose argo get output shows lint → validate → two parallel evals → promote in DAG form with each node transitioning to Succeeded.

Do's and Don'ts

Having walked through the material above, the following Do's and Don'ts distill it into practice.

Do's

✓Do use GenAIArtifactType to constrain the artifact category at WorkflowTemplateBuilder construction time — the enum rejects an unrecognized category in Python before build() produces a WorkflowTemplate whose artifact-type label Kubernetes would silently accept; a construction-time failure takes milliseconds to fix, a malformed CRD discovered after kubectl apply takes a debugging session.
✓Do derive the promote stage's dependencies programmatically from the same eval_datasets list that generates the eval tasks — building eval_deps = [f"eval-{i}" for i in range(len(eval_datasets))] from the same source guarantees the fan-in covers every eval-N branch; a hand-written dependency list diverges silently the moment a dataset is added or removed.
✓Do reference {{workflow.parameters.artifact-path}} in every container command and supply the real path at submission time via argo submit -p artifact-path=<value> — this substitution is what allows one WorkflowTemplate to run the full lint → validate → eval → promote pipeline against any prompt, RAG config, or guardrail policy in the repo without authoring a separate template per artifact.

Don'ts

✗Don't omit the dependencies argument when calling add_stage for stages that must be sequenced — add_stage defaults to dependencies=[], which tells Argo the task is ready to launch immediately in parallel; forgetting dependencies=["lint"] on the validate stage causes lint and validate to run concurrently, making the sequential gate semantically meaningless even though the workflow reaches Succeeded.
✗Don't hardcode the artifact path as a literal string inside the container command list — it collapses the WorkflowTemplate into a single-artifact fixture, forcing a new template per prompt or config and recreating exactly the per-artifact maintenance sprawl that the builder and parameter substitution are designed to eliminate.
✗Don't let the promote stage's dependencies list omit any eval-N name — if even one dataset's eval task is missing from the fan-in, Argo can transition promote to Running while that eval branch is still executing, publishing an artifact whose semantic quality was only partially verified and silently bypassing the regression gate the eval fan-out provides.

Keep going with GenAI Inference Engineering

Create a free account to track your progress and open this lesson in the full learning view. Subscribe to unlock the entire path — every goal, the hands-on labs, quizzes, and your verifiable skill graph — from . Cancel anytime.

Create a free account Subscribe — →