Published taxonomy · v1.0 · 2026-06-14 · CC-BY-4.0

The GenAI Engineering Role Taxonomy

12 disciplines · 102 skills · 14 categories. Roles curated from live GenAI job descriptions and vetted by practicing engineers; each role's responsibilities map to a skill ladder assessed by graded labs. The role is the unit; the skill is what gets measured.

Disciplines & responsibilities

GenAI Application Engineering

Build production RAG & prompt chain applications, design streaming chat UIs, implement guardrails & evaluation, optimize LLM inference costs, and deploy on Kubernetes.

  • Design and build production GenAI features (chatbots, search, summarization) into web applications
  • Implement RAG pipelines with vector databases for enterprise search and knowledge retrieval
  • Optimize LLM inference for latency, cost, and reliability across multiple providers
  • Integrate LLM APIs (OpenAI, Gemini, Anthropic) into existing applications with error handling
  • Build GenAI agent features with tool calling, function execution, and human-in-the-loop workflows
  • Evaluate model outputs using automated metrics and LLM-as-judge for production quality
  • Deploy and containerize GenAI applications on Kubernetes with CI/CD

GenAI Agent Engineering

Build autonomous multi-agent systems with planning, reasoning, tool use, memory, MCP/A2A protocols, safety boundaries, and production evaluation.

  • Design autonomous GenAI agents using state machines with tool calling, memory, and planning
  • Build multi-agent systems with supervisor/worker hierarchies, delegation, and parallel execution
  • Implement MCP servers and clients for standardized tool integration
  • Enable agent-to-agent communication using A2A protocol for cross-framework interoperability
  • Build production RAG agents with iterative retrieval, self-verification, and query decomposition
  • Implement guardrails and safety controls within agent workflows
  • Evaluate agent performance with trajectory analysis and cost tracking
  • Design context engineering — systematic composition of prompts, memory, tools, and history

GenAI Inference Engineering

Architect multi-provider LLM gateways, implement semantic caching and batch optimization, monitor provider SLAs, and optimize inference costs.

  • Design LLM gateway infrastructure routing requests across providers
  • Optimize request latency through caching, batching, and streaming
  • Implement structured output extraction from LLMs with type safety
  • Build cost attribution and FinOps dashboards tracking token spend
  • Monitor inference quality metrics in production
  • Implement intelligent routing — route queries to model tiers based on complexity
  • Manage API rate limits and quotas across providers
  • Deploy inference services on K8s with scaling and health checks

GenAI Platform Engineering

Build internal GenAI developer platforms with self-service capabilities, multi-tenancy, RBAC, CI/CD for model/prompt/guardrail pipelines.

  • Build the internal GenAI platform enabling developers to deploy LLM applications self-service
  • Design multi-tenant infrastructure with namespace isolation and RBAC
  • Implement CI/CD pipelines with GitOps for GenAI applications
  • Manage data infrastructure — databases, caches, message queues on K8s
  • Build autoscaling for GenAI workloads using event-driven scaling and batch job queuing
  • Provision infrastructure-as-code using K8s-native tooling
  • Implement full-stack observability across the GenAI platform
  • Operate LLM gateways as platform infrastructure

Forward Deployed GenAI Engineering

Rapid-prototype GenAI solutions on customer infrastructure, integrate GenAI with customer data and workflows, scope solutions with delivery methodology.

  • Embed on-site with clients to discover GenAI opportunities and scope projects
  • Build rapid prototypes that demonstrate GenAI value within weeks
  • Integrate GenAI into client data systems — databases, APIs, and legacy systems
  • Customize LLM applications for client-specific domains (healthcare, finance, legal)
  • Deploy solutions as packaged Helm charts clients can operate independently
  • Build GenAI agent workflows tailored to client business processes
  • Manage LLM provider costs and build FinOps models for client engagements
  • Configure enterprise guardrails to meet client compliance requirements

LLMOps Engineering

Monitor hallucination rates and token costs, operate guardrails and eval gates, manage prompt versioning and canary deployments.

  • Design CI/CD pipelines for LLM application deployment
  • Monitor LLM systems in production — latency, errors, costs, quality
  • Manage LLM gateway operations — key rotation, failover, quota management
  • Implement FinOps practices — cost attribution, budgets, and optimization
  • Build continuous evaluation pipelines for production LLM quality
  • Detect and respond to prompt attacks and safety incidents in production
  • Manage data quality for RAG systems — freshness, drift, accuracy
  • Implement capacity planning — predict demand and right-size deployments

GenAI Safety & Evaluation Engineering

Design automated LLM evaluation pipelines, red-team GenAI systems, build bias detection and fairness benchmarks, implement guardrails.

  • Build automated evaluation pipelines to continuously measure LLM output quality
  • Conduct red-team exercises — probe LLMs for vulnerabilities
  • Implement production guardrails — content filters, PII detection, jailbreak prevention
  • Design GenAI governance frameworks aligned with regulations
  • Evaluate GenAI agent behavior — trajectory quality, tool selection accuracy
  • Monitor bias, fairness, and hallucination rates in production
  • Build safety incident response processes for deployed GenAI systems
  • Design LlamaFirewall policies for agent safety

GenAI Security Engineering

Engineer defenses against prompt injection, jailbreaks, and data exfiltration. Implement PII leakage detection, content safety, and compliance.

  • Conduct adversarial red-team testing of LLM systems
  • Implement defense-in-depth guardrails — input validation, output filtering, content safety
  • Threat-model GenAI agent systems — analyze attack surfaces across tools, memory, and inter-agent communication
  • Build PII protection — detect, classify, and redact sensitive data in LLM pipelines
  • Design compliance programs aligned with OWASP LLM Top 10, MITRE ATLAS, EU AI Act
  • Build security monitoring for GenAI systems
  • Implement incident response for GenAI security events
  • Secure GenAI supply chain — model provenance, dependency scanning, container security

GenAI Solutions Architecture

Design enterprise GenAI reference architectures, create ADRs and technical standards, bridge GenAI with enterprise workflows.

  • Define enterprise GenAI architecture with proper documentation and governance
  • Design scalable RAG systems at enterprise scale
  • Architect multi-agent systems with MCP mesh and A2A network topology
  • Lead PoC development and production rollouts with model selection and cost estimation
  • Design GenAI governance architecture — RBAC, audit trails, and compliance
  • Oversee operational architecture — observability, FinOps, SLA management
  • Integrate GenAI with enterprise data platforms — pipelines, knowledge graphs, streaming
  • Present architecture decisions with cost/risk analysis to leadership

GenAI Solutions & Delivery

Scope GenAI solutions with estimation, risk, and success criteria. Orchestrate delivery teams, manage client relationships.

  • Lead end-to-end GenAI project delivery from discovery through production handoff
  • Design GenAI architecture for client engagements
  • Build agent-based solutions for client business processes
  • Customize enterprise LLM deployments — gateways, RAG, domain adaptation
  • Manage FinOps for client GenAI projects
  • Scope project timelines and team requirements
  • Package solutions as deployable artifacts for client operations teams
  • Advise clients on technology roadmaps with emerging GenAI patterns

GenAI Engineering Leader

Hire and build GenAI engineering teams, design team structures for GenAI, set engineering quality frameworks.

  • Hire and build GenAI engineering teams
  • Define engineering processes for GenAI development — eval-driven workflows
  • Manage quality and team performance for GenAI outputs
  • Understand the technical stack deeply enough to unblock teams
  • Operate and budget for GenAI infrastructure — FinOps and capacity
  • Design organization structure for GenAI engineering teams
  • Drive technical strategy — evaluate new tools and plan migrations
  • Ensure responsible AI practices across your team

GenAI Data Engineering

Build RAG data pipelines for ingestion, chunking, embedding, and indexing. Manage vector store operations and embedding model lifecycle.

  • Build embedding pipelines — ingest, chunk, embed, and store in vector databases
  • Design RAG data infrastructure — hybrid search and reranking
  • Build knowledge graph pipelines using Neo4j
  • Process documents at scale — parsing, chunking, and quality filtering
  • Implement data quality controls — PII, dedup, compliance filtering
  • Orchestrate data pipelines with scheduling and failure recovery
  • Monitor pipeline health — freshness, quality scores, embedding drift
  • Design multi-tenant data isolation for enterprise RAG

Skill ladder (102 skills)

Agent core (10)
  • Agent Memory SystemsImplements short-term sliding windows, semantic memory, and context optimization for agents.
  • Agent State-Graph PatternsDesigns agent state-graph pipelines with typed schemas, nodes, edges, conditional routing, compilation, and state-transition debugging.
  • Agent Tool Design & ValidationDesigns typed agent tools with Pydantic schemas, docstring parsing, and runtime validation.
  • Agentic RAG & Knowledge GraphsBuilds Self-RAG, Corrective RAG, and GraphRAG pipelines with adaptive retrieval and entity graphs.
  • Enterprise Vertical Agent PatternsBuilds document-processing, triage, and code-review agents with domain-specific tool sets and human handoff points.
  • LangGraph Framework UsageBuilds agents with the LangGraph library: StateGraph, conditional edges, MessagesState, ToolNode integration.
  • Multi-Agent OrchestrationBuilds supervisor, hierarchical, and reflector multi-agent patterns with handoffs and result aggregation.
  • Multimodal & Computer-Use AgentsBuilds vision, voice, computer-use, and code agents with multimodal models and desktop automation.
  • ReAct & Planning Agent LoopsBuilds ReAct agent loops with thought-action-observation, planning, and dynamic replanning.
  • Web Browsing AgentsBuilds agents that navigate web pages with Playwright, extract structured data, and submit forms.
Agent deployment (5)
  • Agent Cost Control & Model RoutingTracks per-agent token spend, routes tasks to cost-appropriate models, and enforces budget limits.
  • Agent Load Testing & Capacity PlanningRuns concurrent-load benchmarks with k6 or Locust, identifies bottlenecks, and plans capacity for production agents.
  • Agent Observability & TracingInstruments agents with OpenTelemetry, Langfuse, fleet dashboards, and tool-use debugging.
  • Agent Release ManagementManages agent config versions with canary rollout, automated rollback, and config drift detection.
  • Production Agent DeploymentServes agents via FastAPI on Kubernetes with Postgres/Redis state, horizontal scaling, and CI/CD pipelines.
Agent infrastructure (2)
  • A2A Protocol & Agent NetworksImplements Agent-to-Agent protocol for discovery, authentication, and remote task delegation across agent fleets.
  • MCP Protocol Servers & ClientsBuilds and consumes MCP servers using JSON-RPC 2.0 over stdio and SSE with tool, resource, and prompt exposure.
Agent safety (3)
  • Agent Evaluation & BenchmarkingBuilds golden datasets, LLM-as-judge pipelines, trajectory scoring, and CI-gated regression testing for agents.
  • Agent Safety Guardrails & Injection DefenseImplements input/output guardrails, jailbreak detection, prompt injection defense, and safety boundaries.
  • Enterprise Agent Governance & AuditEnforces audit trails, escalation policies, human-in-the-loop checkpoints, and compliance reporting on production agents.
Cost & economics (5)
  • Caching Strategies for CostDesigns prompt, semantic, and tool-call caches with appropriate TTLs and invalidation, quantifying cost-per-hit and quality impact.
  • Cost Anomaly MonitoringInstruments cost telemetry per feature/tenant and detects anomalies via baselines or statistical detectors before they become bills.
  • Cost-Aware Model RoutingBuilds cascade and routing strategies that send easy queries to cheap models and hard queries to expensive ones, governed by quality SLOs.
  • GPU Capacity PlanningPlans GPU capacity using spot/reserved/on-demand mix, autoscaling envelopes, and queue-based load shedding to hit SLO at target cost.
  • LLM Cost ModelingModels per-request token economics, p99 cost, and unit economics for LLM features; compares hosted vs. self-hosted total cost.
Customization (8)
  • Continued Pretraining for Domain AdaptationPerforms domain-adaptive continued pretraining on curated corpora and measures downstream-task improvement vs. base model.
  • Distributed Training InfrastructureConfigures distributed training with DeepSpeed, FSDP, or accelerate; understands ZeRO stages, gradient checkpointing, and mixed precision.
  • Few-Shot & In-Context Learning DesignDesigns few-shot exemplar selection (k, ordering, similarity-based retrieval) and measures in-context learning quality.
  • Fine-Tuning EvaluationBuilds evaluation harnesses to compare base vs. fine-tuned models on task suites, regression sets, and held-out human preference data.
  • Preference Optimization (DPO/RLHF)Aligns models with human or AI preferences using DPO, IPO, KTO, or RLHF/RLAIF pipelines, including reward modeling fundamentals.
  • Production Prompt Template EngineeringAuthors versioned production prompts with structured outputs, ablations, and prompt-variant A/B tests under load.
  • Supervised Fine-Tuning (LoRA/QLoRA)Fine-tunes open-weight LLMs with LoRA, QLoRA, and full SFT, manages training hyperparameters, and evaluates instruction-following gains.
  • Training Dataset CurationCurates, deduplicates, and decontaminates training datasets; balances domain mixtures and applies quality filters.
Data engineering (10)
  • Chunking Strategies for RAGSelects chunking strategies (fixed, recursive, semantic, hierarchical, late-chunking) per document class and measures retrieval impact.
  • Data Lake & Warehouse for AIModels AI feature and event tables in BigQuery, Snowflake, or open-table formats (Iceberg, Delta) with appropriate partitioning and clustering.
  • Data Pipeline OrchestrationDesigns idempotent batch and incremental pipelines using Airflow, Dagster, or Prefect, with retries, lineage, and SLAs.
  • Data Quality & ValidationEncodes data contracts, schema checks, drift detection, and quality SLOs using Great Expectations, dbt tests, or equivalent tooling.
  • Document Parsing & ExtractionExtracts structured content from PDF, DOCX, HTML, and scanned images using Unstructured, Docling, or comparable tooling, including layout-aware parsing.
  • Hybrid Retrieval & RerankingCombines lexical (BM25) and dense retrieval, applies cross-encoder rerankers, and tunes retrieval-quality metrics (recall, MRR, nDCG).
  • Knowledge Graph ConstructionBuilds knowledge graphs from unstructured corpora — entity extraction, linking, deduplication, and graph schema design for retrieval.
  • PII & Data GovernanceDetects and redacts PII, enforces data residency and retention policies, and tracks lineage for AI training and inference data.
  • Streaming Data with Kafka/PulsarBuilds event-driven AI pipelines with Kafka or Pulsar — partitioning, consumer groups, exactly-once semantics, and schema evolution.
  • Vector Database OperationsOperates production vector DBs (Pinecone, Weaviate, Qdrant, pgvector) — index tuning, sharding, hybrid filters, and capacity planning.
Evaluation (8)
  • Agent Trajectory EvaluationEvaluates end-to-end agent task success — tool-call correctness, intermediate state validation, and trace-based replay scoring.
  • Bias, Fairness & Toxicity TestingAudits models for demographic bias, fairness gaps, and toxicity using accepted suites and reports impact in plain terms.
  • Domain Benchmark DesignDesigns domain-specific benchmarks with held-out splits, contamination checks, and diverse failure-mode coverage.
  • Factuality & Grounding EvaluationQuantifies hallucination rate and grounding fidelity for RAG and agent outputs using span-level annotators or reference-based metrics.
  • LLM Evaluation HarnessesRuns evaluations using lm-evaluation-harness, Inspect, OpenAI Evals, or custom harnesses with reproducible task specs.
  • LLM Regression Testing in CIWires evaluation suites into CI gates with golden-set tracking, drift alerts, and statistically valid regression thresholds.
  • LLM-as-Judge EvaluationDesigns LLM-judge rubrics with calibration, debiasing, and inter-judge agreement checks; knows when judges are unreliable.
  • Red-Teaming & Jailbreak TestingGenerates adversarial prompts, tests jailbreak resistance, and reports findings with severity and reproduction steps.
Foundations (9)
  • Async Python with asyncioWrites concurrent async/await code with asyncio, gather, semaphores, and async HTTP clients.
  • Configuration & Secrets ManagementManages environment variables, .env files, and secrets safely with python-dotenv and decorators.
  • File I/O, JSON & Exception HandlingReads and writes files, parses JSON, and handles errors with try/except and custom exceptions.
  • Python Classes & DataclassesModels data with classes, dataclasses, methods, and inheritance for structured Python code.
  • Python Core ProgrammingWrites Python programs using variables, control flow, functions, modules, and packages.
  • Python Data Pipelines with PolarsBuilds data transformation pipelines with Polars, generators, and lazy evaluation for tabular data.
  • Python Data Structures & ComprehensionsManipulates lists, tuples, dictionaries, and sets using slicing, iteration, and comprehensions.
  • Python Testing with pytestWrites unit and integration tests with pytest fixtures, assertions, and mocking patterns.
  • Type Hints & Pydantic ModelsBuilds typed Python data models with type hints, generics, protocols, and Pydantic validation.
Inference optimization (10)
  • Continuous Batching & Inference ServingImplements continuous and dynamic batching for high-throughput LLM serving using vLLM, TGI, or comparable engines.
  • GPU Kernel Programming BasicsReads and authors basic Triton or CUDA kernels for custom ops, understands occupancy and memory coalescing fundamentals.
  • GPU Memory ManagementProfiles CUDA memory, sizes batches to fit available VRAM, handles OOM gracefully, and uses gradient checkpointing or offloading for memory-bound workloads.
  • Inference Latency ProfilingProfiles p50/p95/p99 token-generation latency, isolates bottlenecks across tokenizer, attention, and decode phases, and reports actionable findings.
  • KV Cache OptimizationTunes transformer decoder KV cache for throughput and memory; understands prefix caching, paged-attention, and cache eviction strategies.
  • Model Distillation & PruningCompresses large models via knowledge distillation and structured/unstructured pruning while preserving target metrics.
  • Model QuantizationApplies INT8/INT4/FP8 post-training quantization (GPTQ, AWQ, GGUF, bitsandbytes) and measures quality vs. throughput trade-offs.
  • Model Serving FrameworksDeploys LLMs via vLLM, TGI, TensorRT-LLM, or SGLang with appropriate engine flags, schedulers, and runtime configuration.
  • Multi-GPU Tensor & Pipeline ParallelismConfigures tensor-parallel and pipeline-parallel sharding across multiple GPUs to serve models that exceed single-GPU memory.
  • Speculative DecodingImplements speculative-decoding strategies (draft models, Medusa, lookahead) to reduce decoder latency while preserving output distribution.
Infrastructure (7)
  • Docker Containerization for LLM AppsWrites Dockerfiles with multi-stage builds, manages images, and runs containers with Compose.
  • Helm & Kustomize PackagingPackages Kubernetes apps with Helm charts and manages environment overlays with Kustomize.
  • K8s Health Probes & AutoscalingConfigures liveness, readiness, startup probes, HPA, and PodDisruptionBudgets for resilient services.
  • Kubernetes Ingress, TLS & NetworkPolicyExposes services via Ingress with TLS termination and isolates traffic with NetworkPolicies.
  • Kubernetes RBAC & TroubleshootingApplies RBAC, Pod Security Standards, and SecurityContext while debugging CrashLoopBackOff and OOMKilled pods.
  • Kubernetes Workloads, Pods & ServicesDeploys pods, services, and Deployments to Kubernetes with rolling updates and DNS-based discovery.
  • Sandboxed Agent Code ExecutionIsolates agent-generated code in containers with timeouts, cgroup resource limits, and input sanitization to prevent escape.
LLM core (9)
  • Embeddings & Semantic SearchGenerates embeddings, computes cosine similarity, and builds semantic search over documents.
  • LangChain & LCEL RunnablesComposes LangChain Runnables with LCEL pipe syntax, streaming, batching, and configurable runtime fields.
  • LLM API IntegrationCalls OpenAI, Anthropic, and Gemini APIs with auth, error handling, and response parsing.
  • LLM Cost & Resilience OptimizationTracks token costs, applies retry with exponential backoff, and tunes prompts for budget.
  • LLM Function Calling & Tool UseDefines tool schemas in JSON Schema and orchestrates multi-turn function calling across providers.
  • LLM Sampling & Structured OutputControls LLM outputs with temperature, top-p, stop sequences, JSON mode, and structured schemas.
  • Multi-Provider Prompt EngineeringBuilds versioned prompts with Jinja2, few-shot examples, and chain-of-thought across providers.
  • RAG Pipeline FundamentalsBuilds retrieval-augmented generation pipelines with chunking, retrieval, and citation.
  • Transformer Architecture InternalsImplements scaled dot-product attention and reasons about KV-cache memory, FFN dimensions, and quantization tradeoffs to choose inference strategies.
Security (8)
  • AI IAM & Secrets ManagementConfigures IAM, Workload Identity / IRSA, KMS, and short-lived credentials for AI workloads; rotates and audits secrets.
  • Compliance Frameworks for AIMaps AI systems to SOC 2, ISO 27001, HIPAA, and EU AI Act controls; produces evidence and audit-ready documentation.
  • Model Supply-Chain SecurityVerifies model provenance, signed weights, SBOMs, and dependency integrity for open-weight and hosted models.
  • Output Filtering & Data-Loss PreventionBuilds output-side DLP for PII, secrets, and proprietary IP, with deterministic filters layered with model-based classifiers.
  • Prompt Injection DefenseIdentifies direct and indirect prompt-injection vectors and implements input filtering, isolation, and least-privilege tool gating.
  • Threat Modeling for AI SystemsApplies STRIDE / PASTA threat modeling to AI architectures including model, data, and agent-tool boundaries.
  • Vulnerability Scanning for AI StacksRuns SCA, SAST, container, and model-asset scanning in CI; triages and remediates findings with appropriate severity gates.
  • Zero-Trust Networking for AIEnforces network isolation, egress allowlists, mTLS, and zero-trust policies for AI inference and training workloads.
Web APIs (8)
  • API Authentication & AuthorizationImplements OAuth2, JWT, API keys, and role-based access control on FastAPI endpoints.
  • API Gateway & RoutingBuilds reverse-proxy gateways with path routing, load balancing, and response aggregation.
  • API ObservabilityInstruments APIs with Prometheus metrics, OpenTelemetry traces, structured logging, and Grafana dashboards.
  • API Resilience PatternsApplies rate limiting, circuit breakers, retries with backoff, and bulkhead isolation to API services.
  • API Testing & VersioningTests async endpoints with pytest and httpx, and manages API versions with deprecation strategies.
  • Async Databases with SQLAlchemy & AlembicModels data with async SQLAlchemy ORM, manages migrations with Alembic, and applies the repository pattern.
  • FastAPI REST API DevelopmentBuilds production REST APIs with FastAPI using Pydantic validation, dependency injection, and async handlers.
  • Real-Time Streaming with SSE & WebSocketsStreams LLM responses with SSE and manages WebSocket connection lifecycles for real-time apps.

Cite this taxonomy

Released under CC-BY-4.0 — cite it, link it, build on it.

GenBodha. "The GenAI Engineering Role Taxonomy v1.0." genbodha.ai, 2026-06-14. https://genbodha.ai/taxonomy (CC-BY-4.0).