Free lesson
Build defense-in-depth with layered guard chain
Orchestrate multiple injection detectors into a layered pipeline with short-circuit logic, confidence aggregation, and configurable thresholds.
~25 min read · Free to read — no subscription required.
Defense in depth
Introduction
When you rely on a single injection detector in production, one missed pattern or one adversarial bypass defeats your entire defense. The Lethal Trifecta framework addresses this by chaining multiple independent guards — pattern matchers, LLM-as-judge classifiers, and semantic scanners — into a pipeline where each layer compensates for the blind spots of the others. By the end of this lesson, you'll be able to implement a guard chain orchestrator with short-circuit logic, per-guard confidence weighting, and latency budget management that blocks prompt injection attempts across direct and indirect attack vectors.
Key Terminology
- Lethal Trifecta — a defense-in-depth framework that chains three classes of independent guards — pattern matchers, LLM-as-judge classifiers, and semantic scanners — so that each layer's coverage compensates for the blind spots of the others.
- Guard Priority — a numeric field in
GuardConfig(lower value = higher priority) that determines execution order within the chain, ensuring cheap fast detectors run before latency-heavy ones like LLM-as-judge. - Short-Circuit Threshold — the per-guard
short_circuit_thresholdvalue inGuardConfig; when a guard's returned confidence meets or exceeds it,run_guard_chainimmediately returnsallowed=Falsewithout invoking any remaining guards. - Latency Budget — the
budget_msceiling passed torun_guard_chain; if elapsed time plus a guard'stimeout_mswould exceed this value, the orchestrator skips that guard and proceeds to aggregation on results collected so far. - Weighted Confidence Aggregation — the normalized weighted average computed by
aggregate_confidenceacross all completedGuardResultscores, where each guard's contribution is scaled by itsweightinGuardConfig, expressing differential trust across guard types. - Chain Threshold — the
chain_thresholdparameter compared against the final aggregated confidence score; inputs whose weighted score falls below it are allowed, those at or above are blocked.
Concepts
Why Independent Layers Outperform a Single Guard
No single injection detector is reliable enough for production. Pattern matchers are fast but miss novel or obfuscated attacks that don't match known signatures. LLM-as-judge classifiers generalize better but add tens to hundreds of milliseconds of latency and can themselves be misled by adversarial inputs crafted to confuse a language model. Semantic scanners catch embedding-space anomalies that pattern rules can never express, but their accuracy depends on how well the reference corpus covers your threat surface. Each detector has a distinct failure mode — and an attacker who studies your system needs to defeat only the weakest layer if it is your only layer.
The Lethal Trifecta framework makes independent failure modes the defense's structural advantage. When three detectors with different methodologies each have some probability of missing an attack, the probability that all three miss the same attack is the product of their individual miss rates — far lower than any one detector alone. The guard chain architecture (see Code Walkthrough) provides the orchestration machinery to wire these independent guards together while keeping latency predictable and each guard's judgment weighted by the team's calibrated trust in it.
Guard Ordering and Short-Circuit Logic
The orchestrator sorts guards by their priority field before evaluation begins. The design principle is cost asymmetry: a pattern matcher that completes in under 1 ms can definitively block a known injection string at a tiny fraction of the cost of an LLM call. Placing the cheapest, most decisive guards first means clear-cut attacks are stopped before expensive downstream layers are ever reached.
Short-circuit logic formalizes this. Each GuardConfig carries its own short_circuit_threshold. The moment a guard returns a confidence score at or above that value, run_guard_chain sets short_circuited=True and returns allowed=False — the remaining guards in the sorted list never execute. Only inputs that pass the fast detectors with low confidence continue through the full chain to aggregation.
Confidence Weighting and the Final Decision
When no guard triggers a short-circuit, all completed GuardResult objects are passed to aggregate_confidence. Each result's confidence score is multiplied by its guard's weight — a value set in GuardConfig that encodes calibrated trust. A thoroughly tested pattern guard might carry a weight of 0.3, a calibrated LLM-as-judge 0.5, and a newer experimental detector 0.2. Dividing the accumulated weighted sum by the total weight produces a normalized score between 0.0 and 1.0 regardless of how many guards completed.
That score is then compared against chain_threshold. The latency budget interacts with this step in an important way: if the budget expires before all guards run, aggregation proceeds over only the guards that finished. This means chain_threshold must be chosen with partial-result scenarios in mind — a conservative threshold compensates for cases where the slower, higher-signal guards were skipped to meet latency requirements.
Code Walkthrough
Now that you understand why layered defense outperforms any single guard, here is how the orchestrator and its confidence aggregation are implemented in code.
Guard Chain Orchestrator Design
The orchestrator manages a list of guards, each with a priority, a weight for confidence aggregation, and a short-circuit threshold. Guards execute in priority order. If a guard returns a confidence score above its short-circuit threshold, the chain immediately blocks the request without running subsequent guards. If no guard triggers a short-circuit, the orchestrator collects all results and computes a weighted confidence score.
Code snippetpython
1from __future__ import annotations 2from typing import Optional 3from pydantic import BaseModel, Field 4 5class GuardConfig(BaseModel): 6 guard_id: str 7 guard_type: str # "pattern", "llm_judge", "guardrails", "document_scanner" 8 priority: int = Field(ge=0, description="Lower number = higher priority") 9 weight: float = Field(ge=0.0, le=1.0) 10 short_circuit_threshold: float = Field(ge=0.0, le=1.0) 11 timeout_ms: int = Field(default=1000) 12 enabled: bool = True 13 14class GuardResult(BaseModel): 15 guard_id: str 16 confidence: float 17 triggered: bool 18 latency_ms: float 19 20class GuardChainResult(BaseModel): 21 allowed: bool 22 total_confidence: float 23 guard_results: list[GuardResult] 24 short_circuited: bool = False 25 short_circuit_guard: Optional[str] = None 26 total_latency_ms: float 27 28def aggregate_confidence( 29 guard_results: list[GuardResult], 30 guard_configs: dict[str, GuardConfig], 31) -> float: 32 weighted_sum = 0.0 33 total_weight = 0.0 34 for result in guard_results: 35 config = guard_configs[result.guard_id] 36 weighted_sum += result.confidence * config.weight 37 total_weight += config.weight 38 return weighted_sum / total_weight if total_weight > 0 else 0.0
GuardConfig captures everything the orchestrator needs to schedule a guard: its execution priority, the weight applied to its confidence score during aggregation, the short-circuit threshold that triggers an immediate block, and a per-guard timeout so a slow detector cannot stall the pipeline indefinitely.
aggregate_confidence implements the weighted averaging step. It iterates over each completed GuardResult, looks up that guard's weight from the guard_configs dictionary, and accumulates weighted_sum and total_weight. Dividing at the end produces a normalized score between 0.0 and 1.0. This is what allows the Lethal Trifecta framework to express differential trust — a thoroughly tested pattern guard might carry a weight of 0.3, a calibrated LLM-as-judge 0.5, and a newer experimental detector 0.2, so the chain's final decision reflects the team's confidence in each layer.
Latency Budget Management
The guard chain enforces a total latency budget for the complete evaluation pipeline. If the remaining budget is insufficient to run the next guard, the orchestrator skips it and makes a decision based on guards that have already completed. This prevents the security pipeline from becoming a bottleneck at latency-sensitive endpoints.
Code snippetpython
1import time 2 3def run_guard_chain( 4 user_input: str, 5 guards: list[tuple[GuardConfig, callable]], 6 chain_threshold: float, 7 budget_ms: float, 8) -> GuardChainResult: 9 guard_configs = {cfg.guard_id: cfg for cfg, _ in guards} 10 results: list[GuardResult] = [] 11 start = time.monotonic() 12 13 for config, evaluate in sorted(guards, key=lambda g: g[0].priority): 14 if not config.enabled: 15 continue 16 elapsed_ms = (time.monotonic() - start) * 1000 17 if elapsed_ms + config.timeout_ms > budget_ms: 18 break # skip remaining guards to respect latency budget 19 20 t0 = time.monotonic() 21 confidence = evaluate(user_input) 22 latency = (time.monotonic() - t0) * 1000 23 24 result = GuardResult( 25 guard_id=config.guard_id, 26 confidence=confidence, 27 triggered=confidence >= config.short_circuit_threshold, 28 latency_ms=latency, 29 ) 30 results.append(result) 31 32 if result.triggered: 33 return GuardChainResult( 34 allowed=False, 35 total_confidence=confidence, 36 guard_results=results, 37 short_circuited=True, 38 short_circuit_guard=config.guard_id, 39 total_latency_ms=(time.monotonic() - start) * 1000, 40 ) 41 42 total_confidence = aggregate_confidence(results, guard_configs) 43 return GuardChainResult( 44 allowed=total_confidence < chain_threshold, 45 total_confidence=total_confidence, 46 guard_results=results, 47 total_latency_ms=(time.monotonic() - start) * 1000, 48 )
Guards are sorted by priority so the cheapest, fastest detectors — typically pattern matchers that complete in under 1 ms — run first. A high-confidence hit short-circuits immediately, avoiding the latency of LLM-as-judge or semantic scanner layers entirely. Only ambiguous inputs continue through the full chain and reach aggregate_confidence for a weighted decision against chain_threshold.
Confirm that calling run_guard_chain with a known injection string causes the returned GuardChainResult.allowed to be False and short_circuited to be True when the pattern guard fires, and that the total_latency_ms stays well under the configured budget_ms for inputs caught early in the chain.
Do's and Don'ts
Now that you have worked through the implementation, the practices below separate a durable approach from a fragile one.
Do's
- ✓Do assign priority values to guards so the fastest detectors run first — pattern matchers that complete in under 1 ms should carry the lowest
priorityinteger inGuardConfigsorun_guard_chainreaches a short-circuit decision before ever invoking a slower LLM-as-judge or semantic scanner, keeping total latency well underbudget_msfor the majority of malicious inputs. - ✓Do tune each guard's
weightinGuardConfigto reflect your team's calibrated trust in that detector —aggregate_confidenceproduces a weighted average, so a thoroughly validated LLM-as-judge atweight=0.5correctly dominates a newer experimental detector atweight=0.2, and the chain's final decision againstchain_thresholdrepresents actual differential confidence rather than a naive majority vote. - ✓Do set a
timeout_msper guard and enforce the totalbudget_msinrun_guard_chain— the orchestrator skips any guard whosetimeout_mswould pushelapsed_mspastbudget_ms, ensuring the security pipeline never becomes a latency bottleneck at production endpoints even when an LLM-as-judge call is slow.
Don'ts
- ✗Don't rely on a single guard and call it defense-in-depth — if a pattern matcher is your only layer, one adversarial bypass (a novel encoding or indirect injection via a retrieved document) returns
allowed=Truewith no further check; the Lethal Trifecta chain exists precisely because each guard type has distinct blind spots the others compensate for. - ✗Don't set all guard
weightvalues equal when your detectors have meaningfully different precision — equal weights feedaggregate_confidencea flat average, erasing the signal from your most reliable classifier and allowing a high-confidence LLM-as-judge result to be diluted by a poorly calibrated experimental detector below thechain_threshold. - ✗Don't omit the
short_circuit_thresholdcheck inside the priority loop — skipping the early-returnbranch inrun_guard_chainforces every input, including blatant injections, through the full chain and intoaggregate_confidence, adding unnecessary latency and giving subsequent guards a chance to lower the weighted score belowchain_thresholdwhen the first guard already had conclusive evidence to block.
Keep going with GenAI Security Engineering
Create a free account to track your progress and open this lesson in the full learning view. Subscribe to unlock the entire path — every goal, the hands-on labs, quizzes, and your verifiable skill graph — from . Cancel anytime.