Free lesson

Implement ADR recommendation engine using historical outcomes

You will build an `ADRRecommendationEngine` that leverages historical decision outcomes to suggest optimal architecture choices for new scenarios based on similarity to past decisions. Implement `DecisionOutcome` as a Pydantic model with fields: `outcome_id: str`, `adr_id: str`, `measured_at: datetime`, `success_label: OutcomeLabel` (SUCCESS, PARTIAL, FAILURE), `metrics_snapshot: dict[str, float]` capturing key metrics at evaluation time (e.g., `latency_p95_ms`, `error_rate`, `cost_per_day`, `quality_score`), `lessons_learned: str`, `labeler: str`, `review_notes: str`, `time_to_evaluate_days: int`. Store outcomes in PostgreSQL `decision_outcomes` table with columns: `outcome_id VARCHAR(64) PRIMARY KEY`, `adr_id VARCHAR(64) REFERENCES architecture_decisions(adr_id)`, `measured_at TIMESTAMPTZ`, `success_label VARCHAR(16)`, `metrics_snapshot JSONB`, `lessons_learned TEXT`, `labeler VARCHAR(64)`, `review_notes TEXT`. Create index `idx_outcomes_adr_label` on `(adr_id, success_label)` for efficient outcome aggregation. Create index `idx_outcomes_measured_at` on `(measured_at)` for time-range queries. Build `track_outcome()` endpoint at `POST /api/v1/adrs/{adr_id}/outcomes` that records outcomes with validation ensuring the ADR exists and is in ACCEPTED status, returning `OutcomeTrackingResponse` with `outcome_id` and `total_outcomes_for_adr: int` and `success_rate: float`. Implement `embed_adr()` that generates a vector embedding of each ADR's concatenated `context + decision + consequences` text using `openai.embeddings.create(model='text-embedding-3-small', dimensions=1536)` and stores it in `adr_embeddings` table with columns `adr_id VARCHAR(64) PRIMARY KEY`, `embedding vector(1536)`, `embedded_at TIMESTAMPTZ`, `text_hash VARCHAR(64)` for cache invalidation when ADR content changes. Build `find_similar_decisions()` that accepts a new decision context string, embeds it, and performs cosine similarity search against `adr_embeddings` using `SELECT ae.adr_id, ad.title, ad.category, ad.status, 1 - (ae.embedding <=> %s) AS similarity FROM adr_embeddings ae JOIN architecture_decisions ad ON ae.adr_id = ad.adr_id WHERE ad.status IN ('ACCEPTED', 'SUPERSEDED') ORDER BY ae.embedding <=> %s LIMIT 10`, enriching each result with outcome statistics from `SELECT success_label, COUNT(*) FROM decision_outcomes WHERE adr_id = %s GROUP BY success_label`. Return `SimilarDecisions` Pydantic model with `similar_adrs: list[SimilarADR]` where each has `adr_id`, `title`, `category`, `similarity`, `outcome_stats: dict[str, int]`. Implement `generate_recommendation()` that calls `anthropic.messages.create()` with the similar ADRs, their outcomes, and success/failure patterns as context, asking Claude to produce a `RecommendationReport` Pydantic model with `recommended_option: str`, `confidence: float`, `supporting_evidence: list[str]`, `risks: list[str]`, `alternative_options: list[str]`, `historical_success_rate: float`, `key_lessons: list[str]`. Build `ProactiveADRSuggester` that monitors a `technology_updates` Redis stream via `XREAD BLOCK 5000` for new model releases or pricing changes and triggers `generate_recommendation()` when a relevant update matches existing ADR categories by keyword similarity, publishing suggestions to `adr_suggestions` stream. Emit `adr_recommendations_generated_total{category,confidence_bucket}` Prometheus counter, `adr_recommendation_latency_seconds` histogram, and `adr_proactive_suggestions_total{trigger_type}` counter.

~25 min read · Free to read — no subscription required.

Create governance dashboards that surface undocumented decisions and enforce review workflows with expiry tracking

Introduction

When you ship GenAI architecture without an active governance layer, undocumented model swaps quietly displace approved ones, expired ADRs keep anchoring downstream choices nobody has revalidated, and the cost surfaces months later as a regression no one can trace back to a decision — because no decision was ever recorded. By the end of this lesson, you'll be able to design a governance dashboard that detects undocumented decisions from telemetry gaps, enforces review workflows with expiry tracking, and aggregates per-category health scores that feed back into the ADRRecommendationEngine so poorly governed decisions contribute less to future recommendations.

Key Terminology

  • GovernanceGap: a record representing an undocumented architecture decision detected when production telemetry surfaces a model, endpoint, or retrieval strategy that has no matching ADR in the registry; carries severity, environment, first-seen timestamp, and suggested category for triage.
  • ADRRecommendationEngine: the downstream component (introduced in the lab objective) that ranks historical architecture decisions when suggesting new ones; consumes per-category health scores from the dashboard so poorly governed decisions contribute less to future recommendations via a similarity_score * governance_weight ranking term.
  • CategoryHealth: a per-decision-category aggregate score (0.0–1.0) produced by the GovernanceDashboard by combining undocumented-decision counts, overdue reviews, and expired ADRs within that category; it is the unit of governance signal that flows back into the recommendation engine.

Concepts

Connecting Governance to the Recommendation Engine

The health scores produced by GovernanceDashboard feed directly into the ADRRecommendationEngine from the lab objective. When the recommendation engine retrieves historical outcomes to suggest optimal architecture choices, it weights those outcomes by their governance health score. A past decision with a health score of 0.3 (indicating expired reviews and undocumented related decisions) contributes less to future recommendations than a well-governed decision scoring 0.95. This creates a virtuous cycle: teams that maintain their ADRs get better recommendations, which incentivizes governance compliance.

The key integration point is the similarity_score * governance_weight multiplication in the recommendation engine's ranking function. By making governance health a first-class input to automated recommendations, you transform the dashboard from a compliance checkbox into a system that actively improves decision quality over time. Expired model selection ADRs do not just trigger Slack notifications—they degrade the recommendation engine's confidence in suggesting similar models, forcing teams to re-evaluate before the engine will confidently recommend that path again.

This tight coupling between ADR governance, telemetry validation, and the recommendation engine is what distinguishes a production-grade ADR system from a documentation template. The dashboard is not a view layer—it is the control plane for your organization's GenAI architecture decisions.

Loading diagram...

Code Walkthrough

Now that you have seen how governance health scores flow into the ADRRecommendationEngine, the next step is making those scores concrete: first detecting undocumented decisions from telemetry gaps, then aggregating them into the per-category health signal the recommendation engine consumes.

The most dangerous architecture decisions are the ones nobody wrote down. A developer quietly switches from text-embedding-ada-002 to text-embedding-3-small in a single commit, and your RAG pipeline's recall changes without any ADR capturing the rationale. The detector cross-references two inventories: the declared set (ADR subjects in the registry) and the observed set (model identifiers, endpoints, and retrieval strategies seen in production telemetry). Anything observed without a matching ADR becomes a GovernanceGap.

Code snippetpython
1from dataclasses import dataclass, field 2from datetime import datetime 3from enum import Enum 4from typing import Optional 5 6class GapSeverity(Enum): 7 CRITICAL = "critical" # production, no ADR 8 WARNING = "warning" # non-production, no ADR 9 10@dataclass 11class GovernanceGap: 12 observed_resource: str 13 environment: str 14 first_seen: datetime 15 severity: GapSeverity 16 suggested_category: str 17 18@dataclass 19class UndocumentedDecisionDetector: 20 adr_subjects: dict[str, str] = field(default_factory=dict) 21 similarity_threshold: float = 0.85 22 23 def detect_gaps(self, observations: list[dict]) -> list[GovernanceGap]: 24 gaps: list[GovernanceGap] = [] 25 for obs in observations: 26 if self._match_adr(obs["resource_identifier"]) is None: 27 sev = (GapSeverity.CRITICAL 28 if obs["environment"] == "production" 29 else GapSeverity.WARNING) 30 gaps.append(GovernanceGap( 31 observed_resource=obs["resource_identifier"], 32 environment=obs["environment"], 33 first_seen=datetime.fromisoformat(obs["timestamp"]), 34 severity=sev, 35 suggested_category=obs.get("category", "unknown"), 36 )) 37 return gaps 38 39 def _match_adr(self, resource: str) -> Optional[str]: 40 observed = set(resource.lower().replace("-", " ").replace("_", " ").split()) 41 for subject, adr_id in self.adr_subjects.items(): 42 tokens = set(subject.split()) 43 denom = max(len(observed), len(tokens), 1) 44 if len(observed & tokens) / denom >= self.similarity_threshold: 45 return adr_id 46 return None

Detection is only half the problem. The GovernanceDashboard rolls gaps up into a CategoryHealth score (0.0–1.0) per decision category, combining undocumented-decision counts with overdue reviews and expired ADRs, then exposes the similarity_score * governance_weight term the recommendation engine uses for ranking.

Code snippetpython
1@dataclass 2class GovernanceDashboard: 3 detector: UndocumentedDecisionDetector 4 5 def category_health(self, category: str, observations: list[dict], 6 overdue_reviews: int, expired_adrs: int) -> float: 7 gaps = [g for g in self.detector.detect_gaps(observations) 8 if g.suggested_category == category] 9 penalty = len(gaps) + overdue_reviews + expired_adrs 10 return max(0.0, 1.0 - 0.1 * penalty) 11 12 def weighted_similarity(self, similarity_score: float, health: float) -> float: 13 return similarity_score * health # feeds ADRRecommendationEngine ranking

A clean category returns 1.0; one with expired reviews and undocumented swaps drops toward 0.3, dragging down any recommendation that leans on it. Verify by feeding a telemetry batch containing one production resource with no matching ADR: detect_gaps should return exactly one CRITICAL GovernanceGap, and category_health for that category should fall below 1.0 while a fully documented category stays at 1.0.

Do's and Don'ts

Do's

  1. Do cross-reference observed telemetry against declared ADR subjects using UndocumentedDecisionDetector.detect_gaps — quiet substitutions like a text-embedding-ada-002text-embedding-3-small swap never appear in a code review, only in telemetry; without this two-inventory diff, the RAG recall regression surfaces months later with no decision trail to audit.
  2. Do distinguish GapSeverity.CRITICAL (production) from GapSeverity.WARNING (non-production) at gap-creation time inside detect_gaps — severity is set once per observation so that category_health and dashboard displays can prioritize undocumented production resources without re-examining environment data downstream.
  3. Do pipe weighted_similarity(similarity_score, health) — the similarity_score * health product — into ADRRecommendationEngine ranking — this is how expired ADRs and overdue reviews penalize future recommendations automatically; a category with penalty >= 7 collapses to health = 0.3, dragging any recommendation that leans on it far enough down to redirect architects toward better-governed alternatives.

Don'ts

  1. Don't lower similarity_threshold below 0.85 in _match_adr — the denominator is max(len(observed), len(tokens), 1), so a short ADR subject like "embedding model" scores high overlap against nearly any two-token resource identifier, suppressing real gaps and producing a falsely clean category_health score.
  2. Don't omit overdue_reviews or expired_adrs from the category_health penalty — counting only undocumented-decision gaps lets an ADR that exists but has never been revalidated return health = 1.0, which feeds the ADRRecommendationEngine a clean governance signal for a category that is functionally unreviewed.
  3. Don't allow suggested_category to default to "unknown" without a resolution strategycategory_health filters gaps by exact category name, so any gap filed under "unknown" is invisible to every named-category penalty calculation and silently exempts undocumented decisions from dragging down weighted_similarity in the recommendation engine.

Keep going with GenAI Solutions Architecture

Create a free account to track your progress and open this lesson in the full learning view. Subscribe to unlock the entire path — every goal, the hands-on labs, quizzes, and your verifiable skill graph — from . Cancel anytime.