Preview lesson

Implement ADR recommendation engine using historical outcomes

You will build an ADRRecommendationEngine that leverages historical decision outcomes to suggest optimal architecture choices for new scenarios based on similarity to past decisions. Implement DecisionOutcome as a Pydantic model with fields: outcome_id: str, adr_id: str, measured_at: datetime, success_label: OutcomeLabel (SUCCESS, PARTIAL, FAILURE), metrics_snapshot: dict[str, float] capturing key metrics at evaluation time (e.g., latency_p95_ms, error_rate, cost_per_day, quality_score), lessons_learned: str, labeler: str, review_notes: str, time_to_evaluate_days: int. Store outcomes in PostgreSQL decision_outcomes table with columns: outcome_id VARCHAR(64) PRIMARY KEY, adr_id VARCHAR(64) REFERENCES architecture_decisions(adr_id), measured_at TIMESTAMPTZ, success_label VARCHAR(16), metrics_snapshot JSONB, lessons_learned TEXT, labeler VARCHAR(64), review_notes TEXT. Create index idx_outcomes_adr_label on (adr_id, success_label) for efficient outcome aggregation. Create index idx_outcomes_measured_at on (measured_at) for time-range queries. Build track_outcome() endpoint at POST /api/v1/adrs/{adr_id}/outcomes that records outcomes with validation ensuring the ADR exists and is in ACCEPTED status, returning OutcomeTrackingResponse with outcome_id and total_outcomes_for_adr: int and success_rate: float. Implement embed_adr() that generates a vector embedding of each ADR's concatenated context + decision + consequences text using openai.embeddings.create(model='text-embedding-3-small', dimensions=1536) and stores it in adr_embeddings table with columns adr_id VARCHAR(64) PRIMARY KEY, embedding vector(1536), embedded_at TIMESTAMPTZ, text_hash VARCHAR(64) for cache invalidation when ADR content changes. Build find_similar_decisions() that accepts a new decision context string, embeds it, and performs cosine similarity search against adr_embeddings using SELECT ae.adr_id, ad.title, ad.category, ad.status, 1 - (ae.embedding <=> %s) AS similarity FROM adr_embeddings ae JOIN architecture_decisions ad ON ae.adr_id = ad.adr_id WHERE ad.status IN ('ACCEPTED', 'SUPERSEDED') ORDER BY ae.embedding <=> %s LIMIT 10, enriching each result with outcome statistics from SELECT success_label, COUNT(*) FROM decision_outcomes WHERE adr_id = %s GROUP BY success_label. Return SimilarDecisions Pydantic model with similar_adrs: list[SimilarADR] where each has adr_id, title, category, similarity, outcome_stats: dict[str, int]. Implement generate_recommendation() that calls anthropic.messages.create() with the similar ADRs, their outcomes, and success/failure patterns as context, asking Claude to produce a RecommendationReport Pydantic model with recommended_option: str, confidence: float, supporting_evidence: list[str], risks: list[str], alternative_options: list[str], historical_success_rate: float, key_lessons: list[str]. Build ProactiveADRSuggester that monitors a technology_updates Redis stream via XREAD BLOCK 5000 for new model releases or pricing changes and triggers generate_recommendation() when a relevant update matches existing ADR categories by keyword similarity, publishing suggestions to adr_suggestions stream. Emit adr_recommendations_generated_total{category,confidence_bucket} Prometheus counter, adr_recommendation_latency_seconds histogram, and adr_proactive_suggestions_total{trigger_type} counter.

Free to read — no subscription required.

Explore Complete Lesson

Create governance dashboards that surface undocumented decisions and enforce review workflows with expiry tracking

Introduction

When you ship GenAI architecture without an active governance layer, undocumented model swaps quietly displace approved ones, expired ADRs keep anchoring downstream choices nobody has revalidated, and the cost surfaces months later as a regression no one can trace back to a decision — because no decision was ever recorded. By the end of this lesson, you'll be able to design a governance dashboard that detects undocumented decisions from telemetry gaps, enforces review workflows with expiry tracking, and aggregates per-category health scores that feed back into the ADRRecommendationEngine so poorly governed decisions contribute less to future recommendations.

Key Terminology

GovernanceGap: a record representing an undocumented architecture decision detected when production telemetry surfaces a model, endpoint, or retrieval strategy that has no matching ADR in the registry; carries severity, environment, first-seen timestamp, and suggested category for triage.
ADRRecommendationEngine: the downstream component (introduced in the lab objective) that ranks historical architecture decisions when suggesting new ones; consumes per-category health scores from the dashboard so poorly governed decisions contribute less to future recommendations via a similarity_score * governance_weight ranking term.
CategoryHealth: a per-decision-category aggregate score (0.0–1.0) produced by the GovernanceDashboard by combining undocumented-decision counts, overdue reviews, and expired ADRs within that category; it is the unit of governance signal that flows back into the recommendation engine.

Concepts

Connecting Governance to the Recommendation Engine

The health scores produced by GovernanceDashboard feed directly into the ADRRecommendationEngine from the lab objective. When the recommendation engine retrieves historical outcomes to suggest optimal architecture choices, it weights those outcomes by their governance health score. A past decision with a health score of 0.3 (indicating expired reviews and undocumented related decisions) contributes less to future recommendations than a well-governed decision scoring 0.95. This creates a virtuous cycle: teams that maintain their ADRs get better recommendations, which incentivizes governance compliance.

The key integration point is the similarity_score * governance_weight multiplication in the recommendation engine's ranking function. By making governance health a first-class input to automated recommendations, you transform the dashboard from a compliance checkbox into a system that actively improves decision quality over time. Expired model selection ADRs do not just trigger Slack notifications—they degrade the recommendation engine's confidence in suggesting similar models, forcing teams to re-evaluate before the engine will confidently recommend that path again.

This tight coupling between ADR governance, telemetry validation, and the recommendation engine is what distinguishes a production-grade ADR system from a documentation template. The dashboard is not a view layer—it is the control plane for your organization's GenAI architecture decisions.

Loading diagram...

Code Walkthrough

Now that you have seen how governance health scores flow into the ADRRecommendationEngine, the next step is making those scores concrete: first detecting undocumented decisions from telemetry gaps, then aggregating them into the per-category health signal the recommendation engine consumes.

The most dangerous architecture decisions are the ones nobody wrote down. A developer quietly switches from text-embedding-ada-002 to text-embedding-3-small in a single commit, and your RAG pipeline's recall changes without any ADR capturing the rationale. The detector cross-references two inventories: the declared set (ADR subjects in the registry) and the observed set (model identifiers, endpoints, and retrieval strategies seen in production telemetry). Anything observed without a matching ADR becomes a GovernanceGap.

Code snippetpython
1from dataclasses import dataclass, field
2from datetime import datetime
3from enum import Enum
4from typing import Optional
5
6class GapSeverity(Enum):
7    CRITICAL = "critical"   # production, no ADR
8    WARNING = "warning"     # non-production, no ADR
9
10@dataclass
11class GovernanceGap:
12    observed_resource: str
13    environment: str
14    first_seen: datetime
15    severity: GapSeverity
16    suggested_category: str
17
18@dataclass
19class UndocumentedDecisionDetector:
20    adr_subjects: dict[str, str] = field(default_factory=dict)
21    similarity_threshold: float = 0.85
22
23    def detect_gaps(self, observations: list[dict]) -> list[GovernanceGap]:
24        gaps: list[GovernanceGap] = []
25        for obs in observations:
26            if self._match_adr(obs["resource_identifier"]) is None:
27                sev = (GapSeverity.CRITICAL
28                       if obs["environment"] == "production"
29                       else GapSeverity.WARNING)
30                gaps.append(GovernanceGap(
31                    observed_resource=obs["resource_identifier"],
32                    environment=obs["environment"],
33                    first_seen=datetime.fromisoformat(obs["timestamp"]),
34                    severity=sev,
35                    suggested_category=obs.get("category", "unknown"),
36                ))
37        return gaps
38
39    def _match_adr(self, resource: str) -> Optional[str]:
40        observed = set(resource.lower().replace("-", " ").replace("_", " ").split())
41        for subject, adr_id in self.adr_subjects.items():
42            tokens = set(subject.split())
43            denom = max(len(observed), len(tokens), 1)
44            if len(observed & tokens) / denom >= self.similarity_threshold:
45                return adr_id
46        return None

Detection is only half the problem. The GovernanceDashboard rolls gaps up into a CategoryHealth score (0.0–1.0) per decision category, combining undocumented-decision counts with overdue reviews and expired ADRs, then exposes the similarity_score * governance_weight term the recommendation engine uses for ranking.

Code snippetpython
1@dataclass
2class GovernanceDashboard:
3    detector: UndocumentedDecisionDetector
4
5    def category_health(self, category: str, observations: list[dict],
6                        overdue_reviews: int, expired_adrs: int) -> float:
7        gaps = [g for g in self.detector.detect_gaps(observations)
8                if g.suggested_category == category]
9        penalty = len(gaps) + overdue_reviews + expired_adrs
10        return max(0.0, 1.0 - 0.1 * penalty)
11
12    def weighted_similarity(self, similarity_score: float, health: float) -> float:
13        return similarity_score * health  # feeds ADRRecommendationEngine ranking

A clean category returns 1.0; one with expired reviews and undocumented swaps drops toward 0.3, dragging down any recommendation that leans on it. Verify by feeding a telemetry batch containing one production resource with no matching ADR: detect_gaps should return exactly one CRITICAL GovernanceGap, and category_health for that category should fall below 1.0 while a fully documented category stays at 1.0.

Do's and Don'ts

Do's

✓Do cross-reference observed telemetry against declared ADR subjects using UndocumentedDecisionDetector.detect_gaps — quiet substitutions like a text-embedding-ada-002 → text-embedding-3-small swap never appear in a code review, only in telemetry; without this two-inventory diff, the RAG recall regression surfaces months later with no decision trail to audit.
✓Do distinguish GapSeverity.CRITICAL (production) from GapSeverity.WARNING (non-production) at gap-creation time inside detect_gaps — severity is set once per observation so that category_health and dashboard displays can prioritize undocumented production resources without re-examining environment data downstream.
✓Do pipe weighted_similarity(similarity_score, health) — the similarity_score * health product — into ADRRecommendationEngine ranking — this is how expired ADRs and overdue reviews penalize future recommendations automatically; a category with penalty >= 7 collapses to health = 0.3, dragging any recommendation that leans on it far enough down to redirect architects toward better-governed alternatives.

Don'ts

✗Don't lower similarity_threshold below 0.85 in _match_adr — the denominator is max(len(observed), len(tokens), 1), so a short ADR subject like "embedding model" scores high overlap against nearly any two-token resource identifier, suppressing real gaps and producing a falsely clean category_health score.
✗Don't omit overdue_reviews or expired_adrs from the category_health penalty — counting only undocumented-decision gaps lets an ADR that exists but has never been revalidated return health = 1.0, which feeds the ADRRecommendationEngine a clean governance signal for a category that is functionally unreviewed.
✗Don't allow suggested_category to default to "unknown" without a resolution strategy — category_health filters gaps by exact category name, so any gap filed under "unknown" is invisible to every named-category penalty calculation and silently exempts undocumented decisions from dragging down weighted_similarity in the recommendation engine.

Everything in this lesson — plus the hands-on labs, quizzes, and your full learning path.

Explore Complete Lesson See plans — from →