Chapter 1

GenAI ADR Engine

architecture decision recordsdecision taxonomymodel selectionweighted criteria matrixdecision dependency graphstelemetry validationADR governancerecommendation engines

Learning Path

Hands-on Labs

Each objective has a coding lab that opens in VS Code in your browser

Objective 1

Build ADR schema and decision taxonomy for GenAI technology choices

Goal

You will build an `ADRSchema` with a structured decision taxonomy covering GenAI-specific technology choices. Define a `DecisionCategory` enum with variants: `MODEL_SELECTION`, `HOSTING_STRATEGY`, `RAG_VS_FINETUNING`, `GUARDRAIL_PLACEMENT`, `EMBEDDING_PROVIDER`, `VECTOR_STORE_CHOICE`. Implement `ArchitectureDecisionRecord` as a Pydantic model with fields: `adr_id: str`, `title: str`, `category: DecisionCategory`, `status: ADRStatus` (PROPOSED, ACCEPTED, SUPERSEDED, DEPRECATED), `context: str`, `decision: str`, `consequences: list[str]`, `created_at: datetime`, `superseded_by: Optional[str]`, `decision_makers: list[str]`, `tags: list[str]`, `review_deadline: Optional[datetime]`, `priority: int`. Build `DecisionOption` models with `option_name: str`, `provider: str`, `scores: dict[str, float]` mapping criteria like `latency_score`, `cost_score`, `quality_score`, `security_score` to 0-1 normalized values, and `evidence: list[str]` containing URLs or benchmark references supporting each score assignment. Implement `WeightedCriteriaMatrix` that accepts a `criteria_weights: dict[str, float]` and computes `compute_weighted_score()` returning a ranked `ScoredOption` list with `total_score: float` and `rank: int` fields. Build `validate_weights()` ensuring all weights sum to 1.0 within floating-point tolerance, raising `InvalidWeightsError` with details if they do not. Store ADRs in PostgreSQL `architecture_decisions` table with columns: `id SERIAL PRIMARY KEY`, `adr_id VARCHAR(64) UNIQUE NOT NULL`, `title TEXT NOT NULL`, `category VARCHAR(32) NOT NULL`, `status VARCHAR(20) NOT NULL DEFAULT 'PROPOSED'`, `context TEXT`, `decision TEXT`, `consequences JSONB`, `created_at TIMESTAMPTZ DEFAULT NOW()`, `updated_at TIMESTAMPTZ`, `superseded_by VARCHAR(64) REFERENCES architecture_decisions(adr_id)`. Create a `decision_options` junction table with `option_id SERIAL PRIMARY KEY`, `adr_id VARCHAR(64) REFERENCES architecture_decisions(adr_id)`, `option_name VARCHAR(128)`, `provider VARCHAR(64)`, `scores JSONB`, `evidence JSONB`. Build `create_adr()` FastAPI endpoint at `POST /api/v1/adrs` accepting an `ADRCreateRequest` body with required fields `title`, `category`, `context`, `decision`, and returning `ADRCreateResponse` with the generated `adr_id` and `created_at` timestamp. Implement `list_adrs()` at `GET /api/v1/adrs?category={cat}&status={status}&page={page}` with pagination and filtering support. Implement `get_adr_history()` at `GET /api/v1/adrs/{adr_id}/history` that returns the full version chain including all supersede links as a list of `ADRVersionEntry` objects with `adr_id`, `status`, `created_at`, `superseded_by`. Deploy `adr_versions_total{category,status}` Prometheus counter tracking ADR creation rates, `adr_options_evaluated_total{category}` counter tracking scoring activity, and `adr_creation_latency_seconds` histogram measuring endpoint performance. Build `supersede_adr()` at `POST /api/v1/adrs/{adr_id}/supersede` that atomically marks the old ADR as SUPERSEDED and links the new one via the `superseded_by` foreign key, enforcing referential integrity in a PostgreSQL transaction with `SELECT FOR UPDATE` to prevent concurrent supersede race conditions.

Objective 2

Implement multi-provider model selection ADR workflow

Goal

You will build a `ModelSelectionWorkflow` that automates the process of comparing LLM providers and producing a scored ADR for model selection decisions. Implement `ModelCapabilityProfile` as a Pydantic model with fields: `provider: str`, `model_id: str`, `model_family: str`, `max_context_tokens: int`, `supports_streaming: bool`, `supports_tools: bool`, `supports_vision: bool`, `supports_structured_output: bool`, `cost_per_1k_input: float`, `cost_per_1k_output: float`, `median_ttft_ms: float`, `median_tps: float`, `rate_limit_rpm: int`, `rate_limit_tpm: int`, `deprecation_date: Optional[str]`. Build `run_capability_comparison()` that queries LiteLLM's model registry via `litellm.model_list` and enriches profiles with live benchmarks by sending a standardized `PromptSuite` containing 10 prompts across categories (reasoning, coding, summarization, extraction, structured output) to each provider via `litellm.completion()`. Define `PromptSuite` Pydantic model with `suite_id: str`, `version: str`, `prompts: list[BenchmarkPrompt]` where each `BenchmarkPrompt` has `prompt_id: str`, `category: str`, `text: str`, `expected_format: str`, `scoring_rubric: str`, `max_tokens: int`. Record each benchmark response in `BenchmarkResult` with `prompt_id`, `provider`, `model_id`, `response_text`, `latency_ms`, `input_tokens`, `output_tokens`, `cost`, `quality_score`. Implement `CostLatencyQualityAnalyzer` with `analyze_tradeoffs()` that constructs a 3D trade-off surface: x-axis is `cost_per_request`, y-axis is `p95_latency_ms`, z-axis is `quality_score` computed from Instructor-validated structured output accuracy using `instructor.from_litellm()`. Build `find_efficient_frontier()` that identifies models on the Pareto frontier across these three dimensions. Store comparison results in PostgreSQL `model_comparisons` table with columns: `comparison_id VARCHAR(64) PRIMARY KEY`, `run_at TIMESTAMPTZ`, `prompt_suite_version VARCHAR(16)`, `providers_compared JSONB`, `results_json JSONB`, `best_overall VARCHAR(64)`, `best_cost VARCHAR(64)`, `best_quality VARCHAR(64)`, `best_latency VARCHAR(64)`, `pareto_optimal JSONB`. Build `generate_adr_document()` that calls `anthropic.messages.create()` with model `claude-sonnet-4-20250514` and a system prompt containing the structured comparison data, asking Claude to produce a well-reasoned ADR document. Parse the response into `GeneratedADR` Pydantic model with `title: str`, `context: str`, `decision: str`, `rationale: str`, `trade_offs: list[str]`, `risks: list[str]`, `alternatives_considered: list[str]`. Expose `POST /api/v1/adrs/model-selection` endpoint that triggers the full workflow, accepting `ModelSelectionRequest` with `candidate_models: list[str]`, `use_case: str`, `constraints: SelectionConstraints`. Emit `model_comparison_runs_total{provider_count}`, `model_comparison_duration_seconds`, and `model_comparison_quality_spread{suite_version}` Prometheus metrics. Implement `ModelSelectionValidator` with `validate_selection()` that checks the recommended model against minimum thresholds defined in `SelectionPolicy` Pydantic model with fields `min_quality_score: float`, `max_cost_per_request: float`, `max_p95_latency_ms: float`, `required_capabilities: list[str]`, `max_context_required: int`.

Objective 3

Validate ADR decisions against production telemetry

Goal

You will build a `DecisionValidator` that continuously checks whether the assumptions behind accepted ADRs still hold by comparing them against live production telemetry. Implement `ADRAssumption` as a Pydantic model with fields: `assumption_id: str`, `adr_id: str`, `description: str`, `metric_name: str`, `operator: ComparisonOperator` (LT, GT, LTE, GTE, EQ, BETWEEN), `threshold: float`, `upper_bound: Optional[float]` (for BETWEEN operator), `measurement_window: timedelta`, `data_source: DataSource` (PROMETHEUS, POSTGRESQL, LANGFUSE). Build `extract_assumptions()` that parses an accepted ADR's context and decision fields using `litellm.completion()` with Instructor to extract testable assumptions as structured `ADRAssumption` objects, returning `ExtractionResult` with `assumptions: list[ADRAssumption]`, `confidence: float`, `unextractable_claims: list[str]`. Implement `validate_assumptions()` that queries Prometheus via `prometheus_api_client` for each assumption's `metric_name` over the `measurement_window`, using `custom_query()` for PromQL expressions, compares the result against the `threshold` using the specified `operator`, and returns a `ValidationResult` with `is_valid: bool`, `actual_value: float`, `deviation_pct: float`, `trend_direction: TrendDirection` (IMPROVING, STABLE, DEGRADING). Build `StalenessDetector` with `check_staleness()` that runs `validate_assumptions()` on a configurable schedule (default every 6 hours via `check_interval_hours: int`) and marks ADRs as STALE when any assumption fails validation for `consecutive_failures_threshold` (default 3) consecutive checks. Implement `StalenessState` tracking `consecutive_failures: int`, `last_valid_at: datetime`, `staleness_score: float`. Store validation history in PostgreSQL `adr_validations` table with columns: `validation_id VARCHAR(64) PRIMARY KEY`, `adr_id VARCHAR(64) REFERENCES architecture_decisions(adr_id)`, `assumption_id VARCHAR(64)`, `checked_at TIMESTAMPTZ`, `is_valid BOOLEAN`, `actual_value FLOAT`, `deviation_pct FLOAT`, `trend_direction VARCHAR(16)`. Create index `idx_adr_validations_adr_id_checked_at` for efficient history queries. Emit Prometheus metrics: `adr_validation_checks_total{adr_id,result}`, `adr_staleness_score{adr_id}` (0-1 gauge where 1 means all assumptions valid), `adr_assumption_deviation_pct{adr_id,assumption_id}`. Configure Alertmanager rules firing `ADRStale` alert with severity `warning` when `adr_staleness_score` drops below 0.5 for more than 30 minutes. Build `GET /api/v1/adrs/{adr_id}/validation` FastAPI endpoint returning the validation history and current staleness score. Build `GET /api/v1/adrs/{adr_id}/assumptions` endpoint returning all extracted assumptions with their latest validation status. Implement `DecisionEffectivenessScorecard` that aggregates validation results across all ADRs into a Grafana dashboard showing assumption pass rates per category, staleness trends over 30 days, and a ranked table of most-invalidated decisions.

Objective 4

Build ADR dependency graph across system decisions

Goal

You will build a `DecisionDependencyGraph` that models relationships between architecture decisions as a directed acyclic graph, enabling impact analysis when upstream decisions change. Implement `DecisionNode` as a Pydantic model with fields: `adr_id: str`, `title: str`, `category: DecisionCategory`, `status: ADRStatus`, `created_at: datetime`, `depth: int` (distance from root nodes), `in_degree: int`, `out_degree: int`. Implement `DecisionEdge` with `edge_id: str`, `source_adr_id: str`, `target_adr_id: str`, `relationship: DependencyType` (REQUIRES, CONSTRAINS, ENABLES, CONFLICTS_WITH), `strength: float` (0-1), `rationale: str`, `created_at: datetime`. Store the graph in PostgreSQL using `decision_nodes` table with `adr_id VARCHAR(64) PRIMARY KEY REFERENCES architecture_decisions(adr_id)`, `depth INTEGER`, `in_degree INTEGER`, `out_degree INTEGER` and `decision_edges` table with `edge_id VARCHAR(64) PRIMARY KEY`, `source_adr_id VARCHAR(64) REFERENCES decision_nodes(adr_id)`, `target_adr_id VARCHAR(64) REFERENCES decision_nodes(adr_id)`, `relationship VARCHAR(20)`, `strength FLOAT`, `rationale TEXT`, `UNIQUE(source_adr_id, target_adr_id)`. Build `add_dependency()` FastAPI endpoint at `POST /api/v1/adrs/{adr_id}/dependencies` accepting `AddDependencyRequest` with `target_adr_id`, `relationship`, `strength`, `rationale`. Before inserting, run `detect_cycles()` using topological sort (Kahn's algorithm) to reject any edge that would create a cycle, returning a `CyclicDependencyError` with `cycle_path: list[str]` showing the full cycle. Implement `detect_conflicts()` that traverses the graph to find pairs of ADRs connected by CONFLICTS_WITH edges where both have status ACCEPTED, returning `ConflictReport` with `conflicts: list[ConflictPair]` where each has `adr_a: str`, `adr_b: str`, `conflict_description: str`, and emitting `adr_conflicts_active` Prometheus gauge. Build `compute_impact_radius()` that performs a breadth-first traversal from a given ADR node, collecting all downstream dependents at each hop level, and returns an `ImpactAnalysis` Pydantic model with `root_adr: str`, `affected_adrs: list[DecisionNode]`, `affected_depth: int`, `total_affected_count: int`, `risk_score: float` computed as sum of edge strengths along the longest path, `affected_categories: dict[str, int]` counting affected ADRs per category. Expose `GET /api/v1/adrs/{adr_id}/impact` endpoint returning the impact analysis. Implement `propagate_staleness()` that when an upstream ADR is marked STALE, automatically marks all downstream dependents for re-validation by inserting entries into `adr_revalidation_queue` table with `queue_id`, `adr_id`, `triggered_by`, `queued_at`, `priority` (based on depth from stale source). Emit `adr_revalidation_queue_depth` gauge and `adr_staleness_propagations_total{source_category}` counter. Build a graph visualization endpoint at `GET /api/v1/adrs/graph` returning `GraphVisualization` Pydantic model with `nodes: list[GraphNode]` and `edges: list[GraphEdge]` in a format consumable by Grafana's node graph panel.

Objective 5

Implement ADR recommendation engine using historical outcomes

Goal

You will build an `ADRRecommendationEngine` that leverages historical decision outcomes to suggest optimal architecture choices for new scenarios based on similarity to past decisions. Implement `DecisionOutcome` as a Pydantic model with fields: `outcome_id: str`, `adr_id: str`, `measured_at: datetime`, `success_label: OutcomeLabel` (SUCCESS, PARTIAL, FAILURE), `metrics_snapshot: dict[str, float]` capturing key metrics at evaluation time (e.g., `latency_p95_ms`, `error_rate`, `cost_per_day`, `quality_score`), `lessons_learned: str`, `labeler: str`, `review_notes: str`, `time_to_evaluate_days: int`. Store outcomes in PostgreSQL `decision_outcomes` table with columns: `outcome_id VARCHAR(64) PRIMARY KEY`, `adr_id VARCHAR(64) REFERENCES architecture_decisions(adr_id)`, `measured_at TIMESTAMPTZ`, `success_label VARCHAR(16)`, `metrics_snapshot JSONB`, `lessons_learned TEXT`, `labeler VARCHAR(64)`, `review_notes TEXT`. Create index `idx_outcomes_adr_label` on `(adr_id, success_label)` for efficient outcome aggregation. Create index `idx_outcomes_measured_at` on `(measured_at)` for time-range queries. Build `track_outcome()` endpoint at `POST /api/v1/adrs/{adr_id}/outcomes` that records outcomes with validation ensuring the ADR exists and is in ACCEPTED status, returning `OutcomeTrackingResponse` with `outcome_id` and `total_outcomes_for_adr: int` and `success_rate: float`. Implement `embed_adr()` that generates a vector embedding of each ADR's concatenated `context + decision + consequences` text using `openai.embeddings.create(model='text-embedding-3-small', dimensions=1536)` and stores it in `adr_embeddings` table with columns `adr_id VARCHAR(64) PRIMARY KEY`, `embedding vector(1536)`, `embedded_at TIMESTAMPTZ`, `text_hash VARCHAR(64)` for cache invalidation when ADR content changes. Build `find_similar_decisions()` that accepts a new decision context string, embeds it, and performs cosine similarity search against `adr_embeddings` using `SELECT ae.adr_id, ad.title, ad.category, ad.status, 1 - (ae.embedding <=> %s) AS similarity FROM adr_embeddings ae JOIN architecture_decisions ad ON ae.adr_id = ad.adr_id WHERE ad.status IN ('ACCEPTED', 'SUPERSEDED') ORDER BY ae.embedding <=> %s LIMIT 10`, enriching each result with outcome statistics from `SELECT success_label, COUNT(*) FROM decision_outcomes WHERE adr_id = %s GROUP BY success_label`. Return `SimilarDecisions` Pydantic model with `similar_adrs: list[SimilarADR]` where each has `adr_id`, `title`, `category`, `similarity`, `outcome_stats: dict[str, int]`. Implement `generate_recommendation()` that calls `anthropic.messages.create()` with the similar ADRs, their outcomes, and success/failure patterns as context, asking Claude to produce a `RecommendationReport` Pydantic model with `recommended_option: str`, `confidence: float`, `supporting_evidence: list[str]`, `risks: list[str]`, `alternative_options: list[str]`, `historical_success_rate: float`, `key_lessons: list[str]`. Build `ProactiveADRSuggester` that monitors a `technology_updates` Redis stream via `XREAD BLOCK 5000` for new model releases or pricing changes and triggers `generate_recommendation()` when a relevant update matches existing ADR categories by keyword similarity, publishing suggestions to `adr_suggestions` stream. Emit `adr_recommendations_generated_total{category,confidence_bucket}` Prometheus counter, `adr_recommendation_latency_seconds` histogram, and `adr_proactive_suggestions_total{trigger_type}` counter.

Objective 6

Create ADR governance dashboard and compliance audit

Goal

You will build an `ADRGovernanceDashboard` that provides organization-wide visibility into architecture decision health, coverage gaps, and compliance status. Implement `CoverageAnalyzer` with `compute_coverage()` that scans the system's deployed services from a `deployed_services` PostgreSQL table with columns `service_id VARCHAR(64) PRIMARY KEY`, `service_name VARCHAR(128)`, `team VARCHAR(64)`, `deployed_at TIMESTAMPTZ`, `technology_stack JSONB`, `has_llm_integration BOOLEAN` and cross-references against `architecture_decisions` to identify services with no corresponding ADRs, returning an `ADRCoverageReport` Pydantic model with `total_services: int`, `covered_services: int`, `coverage_pct: float`, `uncovered_services: list[UncoveredService]` where each has `service_name: str`, `team: str`, `risk_level: str` based on whether the service has LLM integration without documented decisions. Build a Grafana panel showing coverage percentage as a gauge with thresholds at 60% (yellow) and 80% (green), plus a table listing uncovered services sorted by risk. Implement `ReviewWorkflowEngine` with `submit_for_review()` that transitions an ADR from PROPOSED to PENDING_REVIEW, creates an entry in `adr_reviews` table with columns `review_id VARCHAR(64) PRIMARY KEY`, `adr_id VARCHAR(64)`, `reviewer VARCHAR(64)`, `due_date TIMESTAMPTZ`, `status VARCHAR(16)` (PENDING, APPROVED, REJECTED, EXPIRED), `review_comment TEXT`, `reviewed_at TIMESTAMPTZ`. Build `check_expiry()` running as a daily scheduled task via `asyncio` background task that marks ADRs without review activity past `due_date` as EXPIRED and emits `adr_reviews_expired_total` Prometheus counter. Implement `POST /api/v1/adrs/{adr_id}/approve` accepting `ApprovalRequest` with `reviewer: str`, `review_comment: str` and `POST /api/v1/adrs/{adr_id}/reject` accepting `RejectionRequest` with `reviewer: str`, `review_comment: str`, `required_changes: list[str]`. Build `ComplianceReportGenerator` with `generate_report()` that aggregates: total ADRs by status using `SELECT status, COUNT(*) FROM architecture_decisions GROUP BY status`, average time from PROPOSED to ACCEPTED using `AVG(accepted_at - created_at)`, coverage percentage, stale ADR count, open conflicts count from the dependency graph, and outputs a `GovernanceComplianceReport` Pydantic model with `report_id: str`, `generated_at: datetime`, `total_adrs: int`, `adrs_by_status: dict[str, int]`, `avg_review_time_days: float`, `coverage_pct: float`, `stale_count: int`, `conflict_count: int`, `governance_score: float`. Expose `GET /api/v1/governance/report` endpoint returning the report. Create a comprehensive Grafana dashboard with panels: ADR status distribution pie chart, coverage gauge, staleness trend line over 90 days, review pipeline funnel (PROPOSED -> PENDING -> APPROVED), and conflict count time series. Emit `adr_governance_score` Prometheus gauge computed as weighted average of coverage (40%), freshness (30%), and conflict-free ratio (30%).