GenAI ADR Engine

Objective 1

Build ADR schema and decision taxonomy for GenAI technology choices

Goal

You will build an `ADRSchema` with a structured decision taxonomy covering GenAI-specific technology choices. Define a `DecisionCategory` enum with variants: `MODEL_SELECTION`, `HOSTING_STRATEGY`, `RAG_VS_FINETUNING`, `GUARDRAIL_PLACEMENT`, `EMBEDDING_PROVIDER`, `VECTOR_STORE_CHOICE`. Implement `ArchitectureDecisionRecord` as a Pydantic model with fields: `adr_id: str`, `title: str`, `category: DecisionCategory`, `status: ADRStatus` (PROPOSED, ACCEPTED, SUPERSEDED, DEPRECATED), `context: str`, `decision: str`, `consequences: list[str]`, `created_at: datetime`, `superseded_by: Optional[str]`, `decision_makers: list[str]`, `tags: list[str]`, `review_deadline: Optional[datetime]`, `priority: int`. Build `DecisionOption` models with `option_name: str`, `provider: str`, `scores: dict[str, float]` mapping criteria like `latency_score`, `cost_score`, `quality_score`, `security_score` to 0-1 normalized values, and `evidence: list[str]` containing URLs or benchmark references supporting each score assignment. Implement `WeightedCriteriaMatrix` that accepts a `criteria_weights: dict[str, float]` and computes `compute_weighted_score()` returning a ranked `ScoredOption` list with `total_score: float` and `rank: int` fields. Build `validate_weights()` ensuring all weights sum to 1.0 within floating-point tolerance, raising `InvalidWeightsError` with details if they do not. Store ADRs in PostgreSQL `architecture_decisions` table with columns: `id SERIAL PRIMARY KEY`, `adr_id VARCHAR(64) UNIQUE NOT NULL`, `title TEXT NOT NULL`, `category VARCHAR(32) NOT NULL`, `status VARCHAR(20) NOT NULL DEFAULT 'PROPOSED'`, `context TEXT`, `decision TEXT`, `consequences JSONB`, `created_at TIMESTAMPTZ DEFAULT NOW()`, `updated_at TIMESTAMPTZ`, `superseded_by VARCHAR(64) REFERENCES architecture_decisions(adr_id)`. Create a `decision_options` junction table with `option_id SERIAL PRIMARY KEY`, `adr_id VARCHAR(64) REFERENCES architecture_decisions(adr_id)`, `option_name VARCHAR(128)`, `provider VARCHAR(64)`, `scores JSONB`, `evidence JSONB`. Build `create_adr()` FastAPI endpoint at `POST /api/v1/adrs` accepting an `ADRCreateRequest` body with required fields `title`, `category`, `context`, `decision`, and returning `ADRCreateResponse` with the generated `adr_id` and `created_at` timestamp. Implement `list_adrs()` at `GET /api/v1/adrs?category={cat}&status={status}&page={page}` with pagination and filtering support. Implement `get_adr_history()` at `GET /api/v1/adrs/{adr_id}/history` that returns the full version chain including all supersede links as a list of `ADRVersionEntry` objects with `adr_id`, `status`, `created_at`, `superseded_by`. Deploy `adr_versions_total{category,status}` Prometheus counter tracking ADR creation rates, `adr_options_evaluated_total{category}` counter tracking scoring activity, and `adr_creation_latency_seconds` histogram measuring endpoint performance. Build `supersede_adr()` at `POST /api/v1/adrs/{adr_id}/supersede` that atomically marks the old ADR as SUPERSEDED and links the new one via the `superseded_by` foreign key, enforcing referential integrity in a PostgreSQL transaction with `SELECT FOR UPDATE` to prevent concurrent supersede race conditions.

Objective 2

Implement multi-provider model selection ADR workflow

Goal

You will build a `ModelSelectionWorkflow` that automates the process of comparing LLM providers and producing a scored ADR for model selection decisions. Implement `ModelCapabilityProfile` as a Pydantic model with fields: `provider: str`, `model_id: str`, `model_family: str`, `max_context_tokens: int`, `supports_streaming: bool`, `supports_tools: bool`, `supports_vision: bool`, `supports_structured_output: bool`, `cost_per_1k_input: float`, `cost_per_1k_output: float`, `median_ttft_ms: float`, `median_tps: float`, `rate_limit_rpm: int`, `rate_limit_tpm: int`, `deprecation_date: Optional[str]`. Build `run_capability_comparison()` that queries LiteLLM's model registry via `litellm.model_list` and enriches profiles with live benchmarks by sending a standardized `PromptSuite` containing 10 prompts across categories (reasoning, coding, summarization, extraction, structured output) to each provider via `litellm.completion()`. Define `PromptSuite` Pydantic model with `suite_id: str`, `version: str`, `prompts: list[BenchmarkPrompt]` where each `BenchmarkPrompt` has `prompt_id: str`, `category: str`, `text: str`, `expected_format: str`, `scoring_rubric: str`, `max_tokens: int`. Record each benchmark response in `BenchmarkResult` with `prompt_id`, `provider`, `model_id`, `response_text`, `latency_ms`, `input_tokens`, `output_tokens`, `cost`, `quality_score`. Implement `CostLatencyQualityAnalyzer` with `analyze_tradeoffs()` that constructs a 3D trade-off surface: x-axis is `cost_per_request`, y-axis is `p95_latency_ms`, z-axis is `quality_score` computed from Instructor-validated structured output accuracy using `instructor.from_litellm()`. Build `find_efficient_frontier()` that identifies models on the Pareto frontier across these three dimensions. Store comparison results in PostgreSQL `model_comparisons` table with columns: `comparison_id VARCHAR(64) PRIMARY KEY`, `run_at TIMESTAMPTZ`, `prompt_suite_version VARCHAR(16)`, `providers_compared JSONB`, `results_json JSONB`, `best_overall VARCHAR(64)`, `best_cost VARCHAR(64)`, `best_quality VARCHAR(64)`, `best_latency VARCHAR(64)`, `pareto_optimal JSONB`. Build `generate_adr_document()` that calls `anthropic.messages.create()` with model `claude-sonnet-4-20250514` and a system prompt containing the structured comparison data, asking Claude to produce a well-reasoned ADR document. Parse the response into `GeneratedADR` Pydantic model with `title: str`, `context: str`, `decision: str`, `rationale: str`, `trade_offs: list[str]`, `risks: list[str]`, `alternatives_considered: list[str]`. Expose `POST /api/v1/adrs/model-selection` endpoint that triggers the full workflow, accepting `ModelSelectionRequest` with `candidate_models: list[str]`, `use_case: str`, `constraints: SelectionConstraints`. Emit `model_comparison_runs_total{provider_count}`, `model_comparison_duration_seconds`, and `model_comparison_quality_spread{suite_version}` Prometheus metrics. Implement `ModelSelectionValidator` with `validate_selection()` that checks the recommended model against minimum thresholds defined in `SelectionPolicy` Pydantic model with fields `min_quality_score: float`, `max_cost_per_request: float`, `max_p95_latency_ms: float`, `required_capabilities: list[str]`, `max_context_required: int`.

Objective 3

Validate ADR decisions against production telemetry

Goal

You will build a `DecisionValidator` that continuously checks whether the assumptions behind accepted ADRs still hold by comparing them against live production telemetry. Implement `ADRAssumption` as a Pydantic model with fields: `assumption_id: str`, `adr_id: str`, `description: str`, `metric_name: str`, `operator: ComparisonOperator` (LT, GT, LTE, GTE, EQ, BETWEEN), `threshold: float`, `upper_bound: Optional[float]` (for BETWEEN operator), `measurement_window: timedelta`, `data_source: DataSource` (PROMETHEUS, POSTGRESQL, LANGFUSE). Build `extract_assumptions()` that parses an accepted ADR's context and decision fields using `litellm.completion()` with Instructor to extract testable assumptions as structured `ADRAssumption` objects, returning `ExtractionResult` with `assumptions: list[ADRAssumption]`, `confidence: float`, `unextractable_claims: list[str]`. Implement `validate_assumptions()` that queries Prometheus via `prometheus_api_client` for each assumption's `metric_name` over the `measurement_window`, using `custom_query()` for PromQL expressions, compares the result against the `threshold` using the specified `operator`, and returns a `ValidationResult` with `is_valid: bool`, `actual_value: float`, `deviation_pct: float`, `trend_direction: TrendDirection` (IMPROVING, STABLE, DEGRADING). Build `StalenessDetector` with `check_staleness()` that runs `validate_assumptions()` on a configurable schedule (default every 6 hours via `check_interval_hours: int`) and marks ADRs as STALE when any assumption fails validation for `consecutive_failures_threshold` (default 3) consecutive checks. Implement `StalenessState` tracking `consecutive_failures: int`, `last_valid_at: datetime`, `staleness_score: float`. Store validation history in PostgreSQL `adr_validations` table with columns: `validation_id VARCHAR(64) PRIMARY KEY`, `adr_id VARCHAR(64) REFERENCES architecture_decisions(adr_id)`, `assumption_id VARCHAR(64)`, `checked_at TIMESTAMPTZ`, `is_valid BOOLEAN`, `actual_value FLOAT`, `deviation_pct FLOAT`, `trend_direction VARCHAR(16)`. Create index `idx_adr_validations_adr_id_checked_at` for efficient history queries. Emit Prometheus metrics: `adr_validation_checks_total{adr_id,result}`, `adr_staleness_score{adr_id}` (0-1 gauge where 1 means all assumptions valid), `adr_assumption_deviation_pct{adr_id,assumption_id}`. Configure Alertmanager rules firing `ADRStale` alert with severity `warning` when `adr_staleness_score` drops below 0.5 for more than 30 minutes. Build `GET /api/v1/adrs/{adr_id}/validation` FastAPI endpoint returning the validation history and current staleness score. Build `GET /api/v1/adrs/{adr_id}/assumptions` endpoint returning all extracted assumptions with their latest validation status. Implement `DecisionEffectivenessScorecard` that aggregates validation results across all ADRs into a Grafana dashboard showing assumption pass rates per category, staleness trends over 30 days, and a ranked table of most-invalidated decisions.

Objective 4

Build ADR dependency graph across system decisions

Goal

You will build a `DecisionDependencyGraph` that models relationships between architecture decisions as a directed acyclic graph, enabling impact analysis when upstream decisions change. Implement `DecisionNode` as a Pydantic model with fields: `adr_id: str`, `title: str`, `category: DecisionCategory`, `status: ADRStatus`, `created_at: datetime`, `depth: int` (distance from root nodes), `in_degree: int`, `out_degree: int`. Implement `DecisionEdge` with `edge_id: str`, `source_adr_id: str`, `target_adr_id: str`, `relationship: DependencyType` (REQUIRES, CONSTRAINS, ENABLES, CONFLICTS_WITH), `strength: float` (0-1), `rationale: str`, `created_at: datetime`. Store the graph in PostgreSQL using `decision_nodes` table with `adr_id VARCHAR(64) PRIMARY KEY REFERENCES architecture_decisions(adr_id)`, `depth INTEGER`, `in_degree INTEGER`, `out_degree INTEGER` and `decision_edges` table with `edge_id VARCHAR(64) PRIMARY KEY`, `source_adr_id VARCHAR(64) REFERENCES decision_nodes(adr_id)`, `target_adr_id VARCHAR(64) REFERENCES decision_nodes(adr_id)`, `relationship VARCHAR(20)`, `strength FLOAT`, `rationale TEXT`, `UNIQUE(source_adr_id, target_adr_id)`. Build `add_dependency()` FastAPI endpoint at `POST /api/v1/adrs/{adr_id}/dependencies` accepting `AddDependencyRequest` with `target_adr_id`, `relationship`, `strength`, `rationale`. Before inserting, run `detect_cycles()` using topological sort (Kahn's algorithm) to reject any edge that would create a cycle, returning a `CyclicDependencyError` with `cycle_path: list[str]` showing the full cycle. Implement `detect_conflicts()` that traverses the graph to find pairs of ADRs connected by CONFLICTS_WITH edges where both have status ACCEPTED, returning `ConflictReport` with `conflicts: list[ConflictPair]` where each has `adr_a: str`, `adr_b: str`, `conflict_description: str`, and emitting `adr_conflicts_active` Prometheus gauge. Build `compute_impact_radius()` that performs a breadth-first traversal from a given ADR node, collecting all downstream dependents at each hop level, and returns an `ImpactAnalysis` Pydantic model with `root_adr: str`, `affected_adrs: list[DecisionNode]`, `affected_depth: int`, `total_affected_count: int`, `risk_score: float` computed as sum of edge strengths along the longest path, `affected_categories: dict[str, int]` counting affected ADRs per category. Expose `GET /api/v1/adrs/{adr_id}/impact` endpoint returning the impact analysis. Implement `propagate_staleness()` that when an upstream ADR is marked STALE, automatically marks all downstream dependents for re-validation by inserting entries into `adr_revalidation_queue` table with `queue_id`, `adr_id`, `triggered_by`, `queued_at`, `priority` (based on depth from stale source). Emit `adr_revalidation_queue_depth` gauge and `adr_staleness_propagations_total{source_category}` counter. Build a graph visualization endpoint at `GET /api/v1/adrs/graph` returning `GraphVisualization` Pydantic model with `nodes: list[GraphNode]` and `edges: list[GraphEdge]` in a format consumable by Grafana's node graph panel.

Objective 5

Implement ADR recommendation engine using historical outcomes

Goal

You will build an `ADRRecommendationEngine` that leverages historical decision outcomes to suggest optimal architecture choices for new scenarios based on similarity to past decisions. Implement `DecisionOutcome` as a Pydantic model with fields: `outcome_id: str`, `adr_id: str`, `measured_at: datetime`, `success_label: OutcomeLabel` (SUCCESS, PARTIAL, FAILURE), `metrics_snapshot: dict[str, float]` capturing key metrics at evaluation time (e.g., `latency_p95_ms`, `error_rate`, `cost_per_day`, `quality_score`), `lessons_learned: str`, `labeler: str`, `review_notes: str`, `time_to_evaluate_days: int`. Store outcomes in PostgreSQL `decision_outcomes` table with columns: `outcome_id VARCHAR(64) PRIMARY KEY`, `adr_id VARCHAR(64) REFERENCES architecture_decisions(adr_id)`, `measured_at TIMESTAMPTZ`, `success_label VARCHAR(16)`, `metrics_snapshot JSONB`, `lessons_learned TEXT`, `labeler VARCHAR(64)`, `review_notes TEXT`. Create index `idx_outcomes_adr_label` on `(adr_id, success_label)` for efficient outcome aggregation. Create index `idx_outcomes_measured_at` on `(measured_at)` for time-range queries. Build `track_outcome()` endpoint at `POST /api/v1/adrs/{adr_id}/outcomes` that records outcomes with validation ensuring the ADR exists and is in ACCEPTED status, returning `OutcomeTrackingResponse` with `outcome_id` and `total_outcomes_for_adr: int` and `success_rate: float`. Implement `embed_adr()` that generates a vector embedding of each ADR's concatenated `context + decision + consequences` text using `openai.embeddings.create(model='text-embedding-3-small', dimensions=1536)` and stores it in `adr_embeddings` table with columns `adr_id VARCHAR(64) PRIMARY KEY`, `embedding vector(1536)`, `embedded_at TIMESTAMPTZ`, `text_hash VARCHAR(64)` for cache invalidation when ADR content changes. Build `find_similar_decisions()` that accepts a new decision context string, embeds it, and performs cosine similarity search against `adr_embeddings` using `SELECT ae.adr_id, ad.title, ad.category, ad.status, 1 - (ae.embedding <=> %s) AS similarity FROM adr_embeddings ae JOIN architecture_decisions ad ON ae.adr_id = ad.adr_id WHERE ad.status IN ('ACCEPTED', 'SUPERSEDED') ORDER BY ae.embedding <=> %s LIMIT 10`, enriching each result with outcome statistics from `SELECT success_label, COUNT(*) FROM decision_outcomes WHERE adr_id = %s GROUP BY success_label`. Return `SimilarDecisions` Pydantic model with `similar_adrs: list[SimilarADR]` where each has `adr_id`, `title`, `category`, `similarity`, `outcome_stats: dict[str, int]`. Implement `generate_recommendation()` that calls `anthropic.messages.create()` with the similar ADRs, their outcomes, and success/failure patterns as context, asking Claude to produce a `RecommendationReport` Pydantic model with `recommended_option: str`, `confidence: float`, `supporting_evidence: list[str]`, `risks: list[str]`, `alternative_options: list[str]`, `historical_success_rate: float`, `key_lessons: list[str]`. Build `ProactiveADRSuggester` that monitors a `technology_updates` Redis stream via `XREAD BLOCK 5000` for new model releases or pricing changes and triggers `generate_recommendation()` when a relevant update matches existing ADR categories by keyword similarity, publishing suggestions to `adr_suggestions` stream. Emit `adr_recommendations_generated_total{category,confidence_bucket}` Prometheus counter, `adr_recommendation_latency_seconds` histogram, and `adr_proactive_suggestions_total{trigger_type}` counter.

Objective 6

Create ADR governance dashboard and compliance audit

Goal

You will build an `ADRGovernanceDashboard` that provides organization-wide visibility into architecture decision health, coverage gaps, and compliance status. Implement `CoverageAnalyzer` with `compute_coverage()` that scans the system's deployed services from a `deployed_services` PostgreSQL table with columns `service_id VARCHAR(64) PRIMARY KEY`, `service_name VARCHAR(128)`, `team VARCHAR(64)`, `deployed_at TIMESTAMPTZ`, `technology_stack JSONB`, `has_llm_integration BOOLEAN` and cross-references against `architecture_decisions` to identify services with no corresponding ADRs, returning an `ADRCoverageReport` Pydantic model with `total_services: int`, `covered_services: int`, `coverage_pct: float`, `uncovered_services: list[UncoveredService]` where each has `service_name: str`, `team: str`, `risk_level: str` based on whether the service has LLM integration without documented decisions. Build a Grafana panel showing coverage percentage as a gauge with thresholds at 60% (yellow) and 80% (green), plus a table listing uncovered services sorted by risk. Implement `ReviewWorkflowEngine` with `submit_for_review()` that transitions an ADR from PROPOSED to PENDING_REVIEW, creates an entry in `adr_reviews` table with columns `review_id VARCHAR(64) PRIMARY KEY`, `adr_id VARCHAR(64)`, `reviewer VARCHAR(64)`, `due_date TIMESTAMPTZ`, `status VARCHAR(16)` (PENDING, APPROVED, REJECTED, EXPIRED), `review_comment TEXT`, `reviewed_at TIMESTAMPTZ`. Build `check_expiry()` running as a daily scheduled task via `asyncio` background task that marks ADRs without review activity past `due_date` as EXPIRED and emits `adr_reviews_expired_total` Prometheus counter. Implement `POST /api/v1/adrs/{adr_id}/approve` accepting `ApprovalRequest` with `reviewer: str`, `review_comment: str` and `POST /api/v1/adrs/{adr_id}/reject` accepting `RejectionRequest` with `reviewer: str`, `review_comment: str`, `required_changes: list[str]`. Build `ComplianceReportGenerator` with `generate_report()` that aggregates: total ADRs by status using `SELECT status, COUNT(*) FROM architecture_decisions GROUP BY status`, average time from PROPOSED to ACCEPTED using `AVG(accepted_at - created_at)`, coverage percentage, stale ADR count, open conflicts count from the dependency graph, and outputs a `GovernanceComplianceReport` Pydantic model with `report_id: str`, `generated_at: datetime`, `total_adrs: int`, `adrs_by_status: dict[str, int]`, `avg_review_time_days: float`, `coverage_pct: float`, `stale_count: int`, `conflict_count: int`, `governance_score: float`. Expose `GET /api/v1/governance/report` endpoint returning the report. Create a comprehensive Grafana dashboard with panels: ADR status distribution pie chart, coverage gauge, staleness trend line over 90 days, review pipeline funnel (PROPOSED -> PENDING -> APPROVED), and conflict count time series. Emit `adr_governance_score` Prometheus gauge computed as weighted average of coverage (40%), freshness (30%), and conflict-free ratio (30%).

Learning Path

Reading Material

Knowledge Check

Hands-on Labs

Reading Material

Knowledge Check

Hands-on Labs

Hands-on Labs

Build ADR schema and decision taxonomy for GenAI technology choices

Implement multi-provider model selection ADR workflow

Validate ADR decisions against production telemetry

Build ADR dependency graph across system decisions

Implement ADR recommendation engine using historical outcomes

Create ADR governance dashboard and compliance audit