Prerequisites

This chapter assumes working proficiency with Python 3.11+, including type hints, dataclasses, and enum types. Students should have practical experience with REST API design and JSON schema validation. Familiarity with graph data structures (directed acyclic graphs, topological sorting) is expected, as the dependency graph engine builds on these concepts directly. Prior exposure to Architecture Decision Records (ADRs) in any format (Markdown-based or structured) is helpful but not required—the chapter introduces a purpose-built typed schema from scratch. No specific GenAI model provider accounts are needed; all examples use mock telemetry data.

Learning Goals

Design a typed ADR schema that captures GenAI-specific decision categories in...
- Design a typed ADR schema that captures GenAI-specific decision categoriesincluding model selection, hosting strategy, and RAG-vs-fine-tuning trade-offs, producing a machine-readable record format that goes far beyond traditional prose-based ADRs by encoding decision metadata, constraint boundaries, and expiration triggers directly into the schema itself
- Traditional Architecture Decision Records, as popularized by Michael Nygard's lightweight format, capture a title, status, context, decision, and consequences in a Markdown file stored alongside the codebase. While this approach works for conventional software architecture choices such as selecting a message broker or choosing between monolith and microservices, GenAI systems introduce decision categories that demand structured, queryable metadata rather than free-form prose. A GenAI ADR schema must encode the decision_category field as a typed enumeration — not an open string — so that tooling can distinguish between model-selection decisions, hosting-strategy decisions, retrieval-architecture decisions, and prompt-engineering decisions without relying on text parsing. The enumeration values you will define include MODEL_SELECTION, HOSTING_STRATEGY, RAG_VS_FINE_TUNING, PROMPT_ARCHITECTURE, GUARDRAIL_POLICY, and EMBEDDING_STRATEGY, each of which carries distinct constraint fields. For example, a MODEL_SELECTION ADR must include a model_candidates array with entries specifying provider, model identifier, parameter count tier, context window size, and pricing structure at the time of the decision, because these values drift rapidly — a model that was cost-optimal six months ago may now be deprecated or superseded by a cheaper alternative. The schema enforces that every ADR includes an expires_at timestamp, which is unusual for traditional ADRs but essential for GenAI decisions where model availability, pricing, and capability benchmarks shift on quarterly or even monthly cycles. You will implement this schema using Python dataclasses with runtime validation so that any ADR missing required fields raises a ValueError at creation time rather than silently producing an incomplete record. The status field itself must be an enumeration with values PROPOSED, ACCEPTED, DEPRECATED, SUPERSEDED, and EXPIRED, where transitions between states follow explicit rules — for instance, an ADR can move from ACCEPTED to DEPRECATED only if a superseding ADR identifier is provided, preventing orphaned deprecations that leave teams without a clear current decision.
- The schema must also encode constraint boundaries — quantitative thresholds that, if violated, automatically trigger a review of the decision. For a model-selection ADR, constraint boundaries include maximum acceptable p95 latency in milliseconds, maximum cost per thousand tokens, minimum acceptable quality score on the team's evaluation suite, and maximum acceptable hallucination rate as measured by the team's factuality benchmark. These boundaries are not merely documentation; they become machine-readable triggers that your telemetry validation system (covered in Goal 3) will compare against production metrics. When you design the ConstraintBoundary dataclass, each boundary must carry a metric_name string that exactly matches a key in your telemetry system, a threshold_value as a float, a comparison_operator enumeration (LESS_THAN, GREATER_THAN, LESS_THAN_OR_EQUAL, GREATER_THAN_OR_EQUAL), and a violation_severity enumeration (WARNING, CRITICAL). A WARNING violation triggers a notification to the decision owner, while a CRITICAL violation moves the ADR status to REVIEW_REQUIRED automatically. This approach transforms ADRs from passive documentation into active governance instruments. You will also learn why the context field in a GenAI ADR should be structured rather than free-form: instead of a single prose block, the context splits into business_context (describing the product requirement driving the decision), technical_context (describing system constraints such as latency budgets, throughput requirements, and infrastructure limitations), and ai_specific_context (describing model capability requirements, safety considerations, and evaluation methodology). This three-part context structure ensures that future readers — or automated recommendation engines — can distinguish between decisions driven by cost optimization versus those driven by capability requirements, which is critical when a team needs to reassess decisions after a new model release.
- The relationship between ADRs must also be encoded at the schema level through a dependencies field that lists ADR identifiers this decision depends on and a dependents field that is automatically populated when other ADRs reference this one. A hosting-strategy ADR (choosing between self-hosted inference on GPU instances versus API-based access through a provider) directly constrains which model-selection ADRs are valid — you cannot select a proprietary model accessible only through an API if your hosting-strategy ADR mandates self-hosted inference for data sovereignty reasons. By encoding these dependency relationships directly in the schema, you enable the dependency graph analysis covered in Goal 4. Each dependency entry must include not just the target ADR identifier but also a dependency_type enumeration (CONSTRAINS, ENABLES, CONFLICTS_WITH, SUPERSEDES) and an optional constraint_description string explaining the nature of the relationship. This level of formalism may seem heavy for teams accustomed to lightweight Markdown ADRs, but GenAI systems are uniquely susceptible to cascading decision invalidation — a single upstream change in model availability can ripple through hosting, retrieval, prompt design, and guardrail decisions, and without structured dependency metadata, teams discover these cascades only when production systems break.
- Finally, you will learn to implement versioning semantics for ADR schemas themselves, because the schema will evolve as your organization's GenAI maturity increases. Every ADR record carries a schema_version field following semantic versioning. The ADR engine validates that it can process the schema version of any record it loads, raising a ValueError with a descriptive message if it encounters a schema version newer than its supported range. Schema migrations are handled by a SchemaRegistry class that maps version strings to migration functions, allowing older ADRs to be automatically upgraded to the current schema when loaded. This registry pattern — familiar from database migration frameworks like Alembic — prevents the common failure mode where teams stop updating old ADRs because the format has changed and manual conversion is tedious. The migration functions must be pure and deterministic: given the same input ADR at version N, they always produce the same output at version N+1, which makes them safe to apply automatically during reads without risking data corruption.
Implement multi-provider model comparison workflows that produce scored decis...
- Implement multi-provider model comparison workflowsthat produce scored decision matrices across cost, latency, and quality dimensions, building an evaluation harness that automates the collection, normalization, and weighted scoring of model performance data to eliminate subjective bias from model selection decisions
- Model selection in GenAI systems is one of the highest-impact architecture decisions a team makes, yet it is frequently driven by anecdote, vendor marketing, or individual engineer preference rather than systematic evaluation. The multi-provider comparison workflow you will build addresses this by defining a ModelEvaluationSuite class that encapsulates a set of evaluation tasks, a scoring rubric, and a set of candidate models to evaluate. Each evaluation task is defined by a TaskDefinition dataclass that includes a unique identifier, a natural language description, a set of input prompts, expected output characteristics (not exact matches, since GenAI outputs are non-deterministic, but structural and semantic properties), and a scoring function that maps a model's raw output to a numeric score between 0.0 and 1.0. The scoring functions must handle the non-determinism inherent in language model outputs by running each prompt multiple times (configurable via num_trials, defaulting to 5) and reporting both the mean score and standard deviation. A model with a mean quality score of 0.85 and standard deviation of 0.02 is far more reliable than one scoring 0.87 with a standard deviation of 0.15, and your workflow must surface this distinction rather than collapsing results to a single number. You will implement provider-specific adapter classes — OpenAIAdapter, AnthropicAdapter, GoogleAdapter, and SelfHostedAdapter — each implementing a common ModelProvider protocol that defines complete(prompt, config) and get_pricing_info() methods, so the evaluation harness can treat all providers uniformly while handling the authentication, rate limiting, and response parsing differences that exist between providers.
- The WeightedCriteriaMatrix is the core data structure that transforms raw evaluation results into a ranked recommendation. The matrix defines dimensions — cost_per_1k_tokens, p50_latency_ms, p95_latency_ms, quality_score_mean, quality_score_stddev, context_window_tokens, throughput_tokens_per_second, and hallucination_rate — each with a weight between 0.0 and 1.0 that reflects the team's priorities. The weights must sum to 1.0, and the matrix constructor validates this constraint, raising a ValueError if the sum deviates by more than 0.001 (to account for floating-point imprecision). Normalization is critical: raw metric values span vastly different ranges (latency in milliseconds versus cost in fractional dollars versus quality scores between 0 and 1), so each dimension must be normalized to a 0-1 scale before weighting. You will implement two normalization strategies — min-max normalization that scales values linearly between the minimum and maximum observed across candidates, and z-score normalization that centers values around the mean with unit standard deviation — and learn when each is appropriate. Min-max normalization works well when the candidate set is fixed and you want relative rankings, while z-score normalization is more robust when comparing against historical baselines or when outlier candidates might skew the min-max range. The polarity of each dimension must also be configured: for cost and latency, lower values are better (polarity is MINIMIZE), while for quality score and throughput, higher values are better (polarity is MAXIMIZE). The matrix applies polarity inversion before weighting so that all normalized, weighted scores can be summed directly to produce a final composite score.
- The workflow must also capture evaluation provenance — a complete record of when the evaluation was run, which model versions were tested (not just model names, because providers update models behind stable identifiers), what prompts were used, what the raw outputs were, and what the environmental conditions were (time of day matters because provider latency varies with load). This provenance is stored as part of the resulting ADR's evidence field, linking the decision directly to reproducible evaluation data. Without provenance, model selection ADRs become stale opinions rather than evidence-based decisions. You will implement a ProvenanceCollector class that wraps the evaluation harness and automatically captures all inputs and outputs, writing them to a structured JSON artifact that can be referenced from the ADR. The collector also records the evaluation_cost — the total spend on API calls during the evaluation — which is important because thorough evaluations across many models and many trials can themselves become expensive, and teams need visibility into this cost to budget appropriately for periodic re-evaluation. The complete workflow — from defining tasks to running evaluations to generating a scored matrix to producing a draft ADR — is orchestrated by an EvaluationPipeline class whose run() method returns a ModelSelectionADR pre-populated with candidates, scores, evidence, constraint boundaries derived from the winning model's observed performance, and a recommended expires_at timestamp based on the typical model refresh cycle of the winning provider.
- The comparison workflow must also handle the practical reality that not all models can be evaluated on all tasks. Some models may have context window limitations that prevent them from processing longer prompts in the evaluation suite, some may lack function-calling capabilities required by certain tasks, and some may have rate limits that prevent running the desired number of trials within a reasonable time window. Your EvaluationPipeline must gracefully handle these situations by marking affected cells in the criteria matrix as None rather than zero (which would unfairly penalize the model), and the WeightedCriteriaMatrix scoring logic must handle None values by either excluding that dimension from the model's score and re-normalizing weights, or by assigning a configurable default penalty score. The choice between these strategies is itself a configuration parameter on the matrix, because teams have different preferences: some prefer to penalize models that cannot complete all tasks, while others prefer to evaluate only on the tasks a model can actually perform. Both approaches are defensible, and your implementation must support both while making the choice explicit in the resulting ADR's evidence section.
Validate architecture decisions against production telemetry to detect stale ...
- Validate architecture decisions against production telemetryto detect stale or invalidated assumptions, building a continuous monitoring system that compares the constraint boundaries encoded in accepted ADRs against real-world metrics flowing from production inference pipelines
- The most dangerous ADR is one that was correct when written but has since become invalid due to changes in the production environment — model degradation, cost increases, traffic pattern shifts, or the availability of superior alternatives. Traditional ADR practices have no mechanism to detect this staleness; they rely on engineers remembering to revisit decisions periodically, which in practice means decisions are reviewed only when something breaks catastrophically. The telemetry validation system you will build closes this gap by implementing a DecisionValidator class that periodically (configurable via a cron schedule or event-driven trigger) reads all ACCEPTED ADRs, extracts their constraint boundaries, queries the production telemetry system for the corresponding metrics, and compares observed values against thresholds. The validator must handle the temporal dimension carefully: a constraint boundary on p95 latency should be compared against a rolling window of telemetry data (configurable, defaulting to 7 days) rather than a single point-in-time measurement, because transient spikes should not trigger false alarms while sustained degradation must be caught. You will implement windowed aggregation functions that compute the p50, p95, and p99 of each metric over the configured window, and the comparison logic evaluates the constraint against the appropriate percentile of the windowed data. For cost metrics, the aggregation should compute the trailing daily average rather than percentiles, because cost is an additive metric where the total matters more than the distribution shape.
- The telemetry integration layer must be pluggable because teams use different observability stacks. You will define a TelemetrySource protocol with methods query_metric(metric_name, time_range) returning a time series and list_available_metrics() returning the set of queryable metric names. Concrete implementations will include PrometheusTelemetrySource (querying a Prometheus server via its HTTP API using PromQL), DatadogTelemetrySource (querying Datadog's metrics API), and BigQueryTelemetrySource (running SQL queries against a metrics warehouse). The pluggable architecture means the ADR engine is not coupled to any specific observability vendor, which is especially important in organizations where different teams or environments use different monitoring tools. Each telemetry source implementation must handle authentication, pagination of large result sets, and graceful degradation when the telemetry system is temporarily unavailable — in that case, the validator logs a warning and skips validation for that cycle rather than marking ADRs as violated, because false positives from monitoring outages erode trust in the system. The metric_name strings in constraint boundaries must exactly match the metric names available in the configured telemetry source, and the validator should perform a startup check that verifies all referenced metrics exist, logging a WARNING for any ADR whose constraint references a metric not found in the telemetry source, which typically indicates either a misconfiguration or a metric that was renamed or removed.
- When a constraint boundary is violated, the validator must produce a ValidationResult object that captures the ADR identifier, the violated constraint, the observed metric values over the evaluation window, the threshold that was breached, the severity level (WARNING or CRITICAL), and a timestamp. For WARNING severity, the validator dispatches a notification to the ADR's decision_owner (a field in the ADR schema you defined in Goal 1) via a configurable notification channel — Slack webhook, email, or PagerDuty depending on the team's incident management setup. For CRITICAL severity, the validator additionally transitions the ADR's status from ACCEPTED to REVIEW_REQUIRED and creates a review task in the team's project management system. You will implement a NotificationDispatcher class with a pluggable backend architecture mirroring the telemetry source design, supporting SlackNotifier, EmailNotifier, and PagerDutyNotifier implementations. The notification message must be actionable: it should include not just the fact that a violation occurred but also the specific numbers (observed value versus threshold), a link to the relevant telemetry dashboard, a link to the ADR itself, and a suggested action (re-evaluate the decision using the evaluation pipeline from Goal 2, or update the constraint boundaries if the original thresholds were too aggressive). This level of detail in notifications is what separates a useful governance system from an alert fatigue generator.
- The validation system must also detect ADR expiration independently of constraint violations. Every ADR carries an expires_at timestamp, and the validator must flag ADRs approaching expiration (configurable warning window, defaulting to 30 days before expiry) as well as those already past their expiration date. Expired ADRs should be automatically transitioned to EXPIRED status, and the decision owner should be notified with a prompt to either renew the decision (re-running the evaluation pipeline with current data) or supersede it with a new decision. The expiration check runs alongside the constraint validation in the same periodic cycle. You will also implement a TrendAnalyzer that goes beyond simple threshold comparison by detecting directional trends in metrics — if p95 latency is still within bounds but has been increasing steadily over the past 4 weeks, the system should issue a proactive TREND_WARNING that alerts the team before a threshold is actually breached. The trend analyzer uses linear regression over the windowed time series data and triggers when the projected value (extrapolating the current trend forward by a configurable number of days) would breach the constraint boundary. This proactive detection gives teams time to investigate and address degradation before it impacts users, transforming the ADR system from a reactive alarm into a predictive governance tool.
Build dependency graphs across architecture decisions to enable impact analys...
- Build dependency graphs across architecture decisionsto enable impact analysis when upstream choices change, constructing a directed graph of ADR relationships that supports traversal queries like "if we change our hosting strategy, which downstream decisions are affected" and "what is the minimum set of decisions we must re-evaluate if model X is deprecated"
- GenAI architectures are characterized by tight coupling between decisions that may appear independent on the surface. A hosting-strategy decision constrains model selection (self-hosted inference limits you to open-weight models), model selection constrains prompt architecture (different models have different system prompt conventions, tool-calling formats, and context window budgets), prompt architecture constrains guardrail implementation (some guardrail approaches require specific prompt structures), and guardrail choices constrain the RAG pipeline design (retrieval must operate within the token budget left after guardrail prompts consume their share of the context window). When any one of these decisions changes, the downstream decisions may become invalid, but without an explicit dependency graph, teams discover these cascading impacts through production failures rather than proactive analysis. The dependency graph you will build represents each ADR as a node and each dependency relationship as a directed edge, with edge types corresponding to the dependency_type enumeration from Goal 1: CONSTRAINS, ENABLES, CONFLICTS_WITH, and SUPERSEDES. The graph is implemented using an adjacency list representation within a DecisionGraph class that provides methods for adding nodes and edges, querying direct dependencies, and performing transitive impact analysis. The graph is populated automatically by scanning the dependencies field of every ADR in the registry, which means the graph stays current as new ADRs are added or existing ones are modified without requiring manual graph maintenance.
- The primary query pattern on the dependency graph is impact analysis: given that a specific ADR is being reconsidered (due to telemetry violation, expiration, or a new model release), what other ADRs are potentially affected? You will implement a compute_impact_set(adr_id) method that performs a breadth-first traversal from the specified node, following outgoing CONSTRAINS and ENABLES edges to find all transitively dependent decisions. The result is an ImpactReport object that contains the set of affected ADR identifiers, organized by depth (how many hops from the changed decision), edge type, and current status. The depth information is critical for prioritization: decisions one hop away are directly affected and should be reviewed immediately, while decisions three hops away may only need review if intermediate decisions actually change. The impact report also includes a critical_path — the longest chain of dependencies from the changed decision — which helps teams understand the worst-case scope of a cascading change. You will also implement reverse impact analysis via a compute_dependency_chain(adr_id) method that follows incoming edges to answer the question "what upstream decisions does this ADR depend on, and are any of them currently in REVIEW_REQUIRED or EXPIRED status?" This reverse query is essential during ADR creation: before accepting a new model-selection decision, the system should verify that the hosting-strategy ADR it depends on is still in ACCEPTED status, not silently building on a potentially invalid foundation.
- Cycle detection is a mandatory feature of the dependency graph because circular dependencies between ADRs indicate a modeling error that must be resolved before any meaningful impact analysis can occur. You will implement cycle detection using Tarjan's strongly connected components algorithm within the DecisionGraph class, and the add_edge() method must check for cycles after every insertion, raising a ValueError with a descriptive message listing the cycle path if a cycle would be created. In practice, cycles usually arise from imprecise dependency declarations — for example, a model-selection ADR might declare a dependency on a prompt-architecture ADR (because the model was evaluated with specific prompts) while the prompt-architecture ADR declares a dependency on the model-selection ADR (because the prompt design was tailored to the selected model). The correct resolution is to recognize that these decisions were made jointly and should either be merged into a single ADR or restructured so that one clearly precedes the other, with the later decision depending on the earlier one. Your cycle detection should produce error messages that help engineers understand and resolve the circular dependency rather than simply rejecting the edge insertion with a generic error.
- The dependency graph must also support temporal queries that account for decision evolution over time. When an ADR is superseded, the old node is not removed from the graph; instead, it is marked as SUPERSEDED and a SUPERSEDES edge is added from the new ADR to the old one. This preserves the historical graph structure and enables queries like "show me all decisions that were affected by the original hosting-strategy ADR, including those that have since been updated." You will implement a GraphSnapshot class that can reconstruct the state of the dependency graph at any point in time by filtering nodes and edges based on their creation and status-transition timestamps. This temporal capability is valuable during post-incident reviews, where teams need to understand not just the current state of architecture decisions but the state that existed when an incident occurred. The snapshot reconstruction must be efficient: rather than replaying all events from the beginning, the GraphSnapshot uses a materialized current graph with a change log, applying reverse transformations to roll back to the requested timestamp. The change log records every node addition, edge addition, and status transition with a timestamp, providing a complete audit trail of how the organization's GenAI architecture decisions evolved. This audit trail itself becomes a valuable artifact for architecture reviews, showing patterns like frequently superseded decisions (indicating an area of high uncertainty that might benefit from a more flexible architecture) or long-lived decisions with no dependencies (indicating potentially isolated components that could be evaluated independently).
Create governance dashboards that surface undocumented decisions and enforce ...
- Create governance dashboards that surface undocumented decisions and enforce review workflowswith expiry tracking, building a comprehensive visibility layer that transforms the ADR registry from a passive document store into an active governance system that holds teams accountable for maintaining current, validated architecture decisions
- The governance dashboard serves two audiences with fundamentally different needs: individual engineers who need to understand current decisions affecting their work, and engineering leadership who need visibility into organizational decision health. For individual engineers, the dashboard must provide a Decision Explorer view that displays all ADRs relevant to a specific system or team, filtered by decision category, status, and recency. Each ADR in the explorer shows its current status with a color-coded indicator (green for ACCEPTED, yellow for REVIEW_REQUIRED, red for EXPIRED, gray for DEPRECATED), the time since last review, the constraint boundary health (all passing, some warnings, any critical violations), and the dependency count (how many other decisions depend on this one, indicating its blast radius if changed). The explorer supports drill-down into individual ADRs, showing the full decision record, evaluation evidence, telemetry validation history, and dependency graph neighborhood. You will implement the dashboard data layer as a GovernanceQueryService class that wraps the ADR registry, telemetry validator, and dependency graph to provide pre-computed views optimized for dashboard rendering. The query service materializes aggregate statistics — total ADRs by status, average age of accepted ADRs, number of constraint violations in the past 30 days, number of ADRs approaching expiration — and caches them with a configurable time-to-live to avoid expensive recomputation on every dashboard load.
- For engineering leadership, the dashboard must provide an Organizational Health view that surfaces systemic governance gaps rather than individual ADR details. The most critical metric this view surfaces is the undocumented decision rate — an estimate of how many GenAI architecture decisions have been made without corresponding ADRs. Detecting undocumented decisions is inherently heuristic, and you will implement several detection strategies within an UndocumentedDecisionDetector class. The first strategy scans infrastructure-as-code and deployment configurations for model endpoint references, embedding model identifiers, and vector database connection strings that are not referenced by any ADR, flagging them as potential undocumented model-selection or embedding-strategy decisions. The second strategy analyzes code repositories for imports of provider-specific SDKs (such as the OpenAI client library, Anthropic SDK, or Google AI Platform client) and cross-references the discovered provider usage against existing model-selection ADRs to identify providers being used without a corresponding documented decision. The third strategy examines cost billing data (from cloud provider billing APIs) for inference-related charges that exceed a configurable threshold without a corresponding ADR — if a team is spending more than a certain dollar amount per month on a model API, there should be a documented decision explaining why that model was selected. None of these heuristics are perfect, and each will produce both false positives and false negatives, so the detector assigns a confidence_score between 0.0 and 1.0 to each finding and the dashboard allows leadership to triage findings, dismissing false positives (which trains the detector to reduce similar findings in the future) or escalating true positives into ADR creation tasks assigned to the responsible team.
- The review workflow engine enforces that ADRs do not languish in REVIEW_REQUIRED status indefinitely. When an ADR transitions to REVIEW_REQUIRED (either from telemetry violation or approaching expiration), the ReviewWorkflowManager class creates a review task with a deadline (configurable, defaulting to 14 days), assigns it to the ADR's decision_owner, and begins tracking progress. If the review is not completed by the deadline, the workflow manager escalates by notifying the decision owner's manager (resolved from an organizational hierarchy integration) and extending the deadline by 7 days. If the extended deadline also passes without resolution, the ADR is flagged as a governance violation and surfaced prominently on the organizational health dashboard. The review itself follows a structured process: the reviewer must either reaffirm the current decision (updating the expires_at timestamp and optionally adjusting constraint boundaries based on current telemetry), supersede the decision (creating a new ADR and linking it via a SUPERSEDES relationship), or deprecate the decision (marking it as DEPRECATED with a justification). The workflow manager validates that the review outcome is complete — for example, a superseding action is rejected if the new ADR is not provided, and a reaffirmation is rejected if the expires_at timestamp is not extended beyond the current date. This structural enforcement prevents rubber-stamp reviews where engineers click "approve" without actually evaluating whether the decision is still valid. You will implement the workflow state machine using an explicit state enumeration (PENDING_REVIEW, IN_REVIEW, REVIEW_COMPLETE, ESCALATED, GOVERNANCE_VIOLATION) with transition rules that prevent invalid state changes, raising a ValueError if code attempts to transition a review from PENDING_REVIEW directly to REVIEW_COMPLETE without passing through IN_REVIEW.
- The dashboard must also provide a Decision Timeline view that visualizes the lifecycle of decisions over time, showing when each ADR was proposed, accepted, reviewed, and eventually superseded or deprecated. This timeline visualization helps teams identify patterns in their decision-making process — for example, if model-selection ADRs are being superseded every two months, the default expiration period for that decision category should probably be set to two months rather than the default six months, avoiding the governance overhead of repeatedly extending decisions that will inevitably be superseded. The timeline also surfaces decision velocity — how long it takes from proposal to acceptance — which is a leading indicator of process health. If decision velocity is increasing over time, it may indicate that the review process has become too bureaucratic and needs streamlining, or that the evaluation pipeline is too slow and needs optimization. You will implement the timeline data model as a sequence of DecisionEvent objects, each carrying an ADR identifier, event type (PROPOSED, ACCEPTED, REVIEW_TRIGGERED, REVIEW_COMPLETED, SUPERSEDED, DEPRECATED, EXPIRED), timestamp, and actor (the person or system that triggered the event). The timeline aggregation logic computes statistics per decision category and per team, enabling comparisons that reveal whether certain types of decisions or certain teams have governance patterns that differ from organizational norms. Finally, the dashboard must support export capabilities — generating PDF reports for architecture review boards, CSV exports for further analysis, and API endpoints for integration with other governance tools — ensuring that the ADR governance system is not an isolated silo but a component of the organization's broader engineering governance ecosystem.

Key Terminology

Architecture Decision Record (ADR)

A structured document that captures a single architecture decision, its context, the options considered, the rationale for the chosen option, and the consequences that follow from that choice.

Decision Taxonomy

A hierarchical classification system that organizes architecture decisions into domain-specific categories such as model selection, hosting strategy, data pipeline design, and retrieval approach, enabling systematic querying and impact analysis across a portfolio of ADRs.

Weighted Criteria Matrix

A quantitative evaluation framework that scores each candidate option against a set of criteria—such as cost, latency, quality, and compliance—where each criterion carries a numeric weight reflecting its relative importance to the business context.

Decision Dependency Graph

A directed acyclic graph where nodes represent individual architecture decisions and edges represent causal or constraining relationships, so that changing an upstream decision (for example, switching from a hosted API to a self-hosted model) triggers impact propagation to all downstream dependents.

Telemetry Validation

The process of continuously comparing the assumptions recorded in an ADR—such as expected p95 latency or cost-per-token—against live production metrics to detect drift, staleness, or outright invalidation of the original decision rationale.

Decision Status Lifecycle

The finite set of states an ADR transitions through—typically **Proposed**, **Accepted**, **Superseded**, **Deprecated**, and **Rejected**—each with governance rules controlling who may authorize the transition and what documentation is required.

Model Selection Matrix

A specialized weighted criteria matrix tailored for comparing large language model providers across dimensions including token cost, context window size, throughput limits, fine-tuning availability, and benchmark performance on domain-specific evaluation sets.

RAG-vs-Fine-Tuning Trade-off

The architectural decision point where engineers choose between Retrieval-Augmented Generation, which augments prompts with retrieved context at inference time, and fine-tuning, which embeds domain knowledge directly into model weights during a training phase, each carrying distinct cost, latency, freshness, and accuracy profiles.

Decision Expiry Policy

A governance mechanism that assigns a time-to-live to each ADR, after which the decision must be re-evaluated against current production telemetry, vendor pricing, and model capability benchmarks to confirm it remains valid or should be superseded.

Hosting Strategy Classification

The categorization of model deployment approaches into tiers—fully managed API, dedicated inference endpoint, self-hosted on GPU infrastructure, and edge deployment—each implying different trade-offs in latency, cost control, data residency, and operational burden.

Impact Analysis

The automated traversal of a decision dependency graph to identify every downstream ADR, service, and configuration artifact affected when a specific upstream decision is proposed for change or has been invalidated by telemetry drift.

Recommendation Engine (ADR Context)

A rule-based or scoring-based system that ingests project constraints, historical telemetry, and the current decision taxonomy to suggest ranked candidate options for a new architecture decision, reducing the cognitive load on the decision author.

Decision Governance Dashboard

A real-time visualization layer that surfaces key ADR health metrics including the count of undocumented decisions, decisions past their expiry date, decisions with failing telemetry validation, and decisions pending review approval.

Stale Decision

An accepted ADR whose recorded assumptions—such as token pricing, model availability, or latency benchmarks—no longer hold true when compared against current production telemetry or vendor documentation, signaling the need for re-evaluation.

Supersession Chain

The linked sequence of ADRs where each newer record explicitly references and replaces its predecessor, preserving full audit history so that engineers can trace why a decision evolved from one option to another over time.

Decision Quorum

The minimum set of stakeholder roles—such as ML engineer, platform architect, and security reviewer—required to approve an ADR before it transitions from **Proposed** to **Accepted** status, enforcing cross-functional accountability.

Constraint Propagation

The mechanism by which a constraint introduced in one ADR—for example, a data residency requirement mandating EU-only hosting—automatically narrows the valid option space in all dependent downstream decisions such as model provider selection and vector database deployment region.

On This Page

Prerequisites

Learning Goals

Key Terminology

On This Page