Free lesson

Build defense-in-depth with layered guard chain

Orchestrate multiple injection detectors into a layered pipeline with short-circuit logic, confidence aggregation, and configurable thresholds.

~25 min read · Free to read — no subscription required.

Defense in depth

Introduction

When you rely on a single injection detector in production, one missed pattern or one adversarial bypass defeats your entire defense. The Lethal Trifecta framework addresses this by chaining multiple independent guards — pattern matchers, LLM-as-judge classifiers, and semantic scanners — into a pipeline where each layer compensates for the blind spots of the others. By the end of this lesson, you'll be able to implement a guard chain orchestrator with short-circuit logic, per-guard confidence weighting, and latency budget management that blocks prompt injection attempts across direct and indirect attack vectors.

Key Terminology

Lethal Trifecta — a defense-in-depth framework that chains three classes of independent guards — pattern matchers, LLM-as-judge classifiers, and semantic scanners — so that each layer's coverage compensates for the blind spots of the others.
Guard Priority — a numeric field in GuardConfig (lower value = higher priority) that determines execution order within the chain, ensuring cheap fast detectors run before latency-heavy ones like LLM-as-judge.
Short-Circuit Threshold — the per-guard short_circuit_threshold value in GuardConfig; when a guard's returned confidence meets or exceeds it, run_guard_chain immediately returns allowed=False without invoking any remaining guards.
Latency Budget — the budget_ms ceiling passed to run_guard_chain; if elapsed time plus a guard's timeout_ms would exceed this value, the orchestrator skips that guard and proceeds to aggregation on results collected so far.
Weighted Confidence Aggregation — the normalized weighted average computed by aggregate_confidence across all completed GuardResult scores, where each guard's contribution is scaled by its weight in GuardConfig, expressing differential trust across guard types.
Chain Threshold — the chain_threshold parameter compared against the final aggregated confidence score; inputs whose weighted score falls below it are allowed, those at or above are blocked.

Concepts

Why Independent Layers Outperform a Single Guard

No single injection detector is reliable enough for production. Pattern matchers are fast but miss novel or obfuscated attacks that don't match known signatures. LLM-as-judge classifiers generalize better but add tens to hundreds of milliseconds of latency and can themselves be misled by adversarial inputs crafted to confuse a language model. Semantic scanners catch embedding-space anomalies that pattern rules can never express, but their accuracy depends on how well the reference corpus covers your threat surface. Each detector has a distinct failure mode — and an attacker who studies your system needs to defeat only the weakest layer if it is your only layer.

The Lethal Trifecta framework makes independent failure modes the defense's structural advantage. When three detectors with different methodologies each have some probability of missing an attack, the probability that all three miss the same attack is the product of their individual miss rates — far lower than any one detector alone. The guard chain architecture (see Code Walkthrough) provides the orchestration machinery to wire these independent guards together while keeping latency predictable and each guard's judgment weighted by the team's calibrated trust in it.

Guard Ordering and Short-Circuit Logic

The orchestrator sorts guards by their priority field before evaluation begins. The design principle is cost asymmetry: a pattern matcher that completes in under 1 ms can definitively block a known injection string at a tiny fraction of the cost of an LLM call. Placing the cheapest, most decisive guards first means clear-cut attacks are stopped before expensive downstream layers are ever reached.

Short-circuit logic formalizes this. Each GuardConfig carries its own short_circuit_threshold. The moment a guard returns a confidence score at or above that value, run_guard_chain sets short_circuited=True and returns allowed=False — the remaining guards in the sorted list never execute. Only inputs that pass the fast detectors with low confidence continue through the full chain to aggregation.

Loading diagram...

Confidence Weighting and the Final Decision

When no guard triggers a short-circuit, all completed GuardResult objects are passed to aggregate_confidence. Each result's confidence score is multiplied by its guard's weight — a value set in GuardConfig that encodes calibrated trust. A thoroughly tested pattern guard might carry a weight of 0.3, a calibrated LLM-as-judge 0.5, and a newer experimental detector 0.2. Dividing the accumulated weighted sum by the total weight produces a normalized score between 0.0 and 1.0 regardless of how many guards completed.

That score is then compared against chain_threshold. The latency budget interacts with this step in an important way: if the budget expires before all guards run, aggregation proceeds over only the guards that finished. This means chain_threshold must be chosen with partial-result scenarios in mind — a conservative threshold compensates for cases where the slower, higher-signal guards were skipped to meet latency requirements.

Code Walkthrough

Now that you understand why layered defense outperforms any single guard, here is how the orchestrator and its confidence aggregation are implemented in code.

Guard Chain Orchestrator Design

The orchestrator manages a list of guards, each with a priority, a weight for confidence aggregation, and a short-circuit threshold. Guards execute in priority order. If a guard returns a confidence score above its short-circuit threshold, the chain immediately blocks the request without running subsequent guards. If no guard triggers a short-circuit, the orchestrator collects all results and computes a weighted confidence score.

Code snippetpython
1from __future__ import annotations
2from typing import Optional
3from pydantic import BaseModel, Field
4
5class GuardConfig(BaseModel):
6    guard_id: str
7    guard_type: str  # "pattern", "llm_judge", "guardrails", "document_scanner"
8    priority: int = Field(ge=0, description="Lower number = higher priority")
9    weight: float = Field(ge=0.0, le=1.0)
10    short_circuit_threshold: float = Field(ge=0.0, le=1.0)
11    timeout_ms: int = Field(default=1000)
12    enabled: bool = True
13
14class GuardResult(BaseModel):
15    guard_id: str
16    confidence: float
17    triggered: bool
18    latency_ms: float
19
20class GuardChainResult(BaseModel):
21    allowed: bool
22    total_confidence: float
23    guard_results: list[GuardResult]
24    short_circuited: bool = False
25    short_circuit_guard: Optional[str] = None
26    total_latency_ms: float
27
28def aggregate_confidence(
29    guard_results: list[GuardResult],
30    guard_configs: dict[str, GuardConfig],
31) -> float:
32    weighted_sum = 0.0
33    total_weight = 0.0
34    for result in guard_results:
35        config = guard_configs[result.guard_id]
36        weighted_sum += result.confidence * config.weight
37        total_weight += config.weight
38    return weighted_sum / total_weight if total_weight > 0 else 0.0

GuardConfig captures everything the orchestrator needs to schedule a guard: its execution priority, the weight applied to its confidence score during aggregation, the short-circuit threshold that triggers an immediate block, and a per-guard timeout so a slow detector cannot stall the pipeline indefinitely.

aggregate_confidence implements the weighted averaging step. It iterates over each completed GuardResult, looks up that guard's weight from the guard_configs dictionary, and accumulates weighted_sum and total_weight. Dividing at the end produces a normalized score between 0.0 and 1.0. This is what allows the Lethal Trifecta framework to express differential trust — a thoroughly tested pattern guard might carry a weight of 0.3, a calibrated LLM-as-judge 0.5, and a newer experimental detector 0.2, so the chain's final decision reflects the team's confidence in each layer.

Latency Budget Management

The guard chain enforces a total latency budget for the complete evaluation pipeline. If the remaining budget is insufficient to run the next guard, the orchestrator skips it and makes a decision based on guards that have already completed. This prevents the security pipeline from becoming a bottleneck at latency-sensitive endpoints.

Code snippetpython
1import time
2
3def run_guard_chain(
4    user_input: str,
5    guards: list[tuple[GuardConfig, callable]],
6    chain_threshold: float,
7    budget_ms: float,
8) -> GuardChainResult:
9    guard_configs = {cfg.guard_id: cfg for cfg, _ in guards}
10    results: list[GuardResult] = []
11    start = time.monotonic()
12
13    for config, evaluate in sorted(guards, key=lambda g: g[0].priority):
14        if not config.enabled:
15            continue
16        elapsed_ms = (time.monotonic() - start) * 1000
17        if elapsed_ms + config.timeout_ms > budget_ms:
18            break  # skip remaining guards to respect latency budget
19
20        t0 = time.monotonic()
21        confidence = evaluate(user_input)
22        latency = (time.monotonic() - t0) * 1000
23
24        result = GuardResult(
25            guard_id=config.guard_id,
26            confidence=confidence,
27            triggered=confidence >= config.short_circuit_threshold,
28            latency_ms=latency,
29        )
30        results.append(result)
31
32        if result.triggered:
33            return GuardChainResult(
34                allowed=False,
35                total_confidence=confidence,
36                guard_results=results,
37                short_circuited=True,
38                short_circuit_guard=config.guard_id,
39                total_latency_ms=(time.monotonic() - start) * 1000,
40            )
41
42    total_confidence = aggregate_confidence(results, guard_configs)
43    return GuardChainResult(
44        allowed=total_confidence < chain_threshold,
45        total_confidence=total_confidence,
46        guard_results=results,
47        total_latency_ms=(time.monotonic() - start) * 1000,
48    )

Guards are sorted by priority so the cheapest, fastest detectors — typically pattern matchers that complete in under 1 ms — run first. A high-confidence hit short-circuits immediately, avoiding the latency of LLM-as-judge or semantic scanner layers entirely. Only ambiguous inputs continue through the full chain and reach aggregate_confidence for a weighted decision against chain_threshold.

Confirm that calling run_guard_chain with a known injection string causes the returned GuardChainResult.allowed to be False and short_circuited to be True when the pattern guard fires, and that the total_latency_ms stays well under the configured budget_ms for inputs caught early in the chain.

Do's and Don'ts

Now that you have worked through the implementation, the practices below separate a durable approach from a fragile one.

Do's

✓Do assign priority values to guards so the fastest detectors run first — pattern matchers that complete in under 1 ms should carry the lowest priority integer in GuardConfig so run_guard_chain reaches a short-circuit decision before ever invoking a slower LLM-as-judge or semantic scanner, keeping total latency well under budget_ms for the majority of malicious inputs.
✓Do tune each guard's weight in GuardConfig to reflect your team's calibrated trust in that detector — aggregate_confidence produces a weighted average, so a thoroughly validated LLM-as-judge at weight=0.5 correctly dominates a newer experimental detector at weight=0.2, and the chain's final decision against chain_threshold represents actual differential confidence rather than a naive majority vote.
✓Do set a timeout_ms per guard and enforce the total budget_ms in run_guard_chain — the orchestrator skips any guard whose timeout_ms would push elapsed_ms past budget_ms, ensuring the security pipeline never becomes a latency bottleneck at production endpoints even when an LLM-as-judge call is slow.

Don'ts

✗Don't rely on a single guard and call it defense-in-depth — if a pattern matcher is your only layer, one adversarial bypass (a novel encoding or indirect injection via a retrieved document) returns allowed=True with no further check; the Lethal Trifecta chain exists precisely because each guard type has distinct blind spots the others compensate for.
✗Don't set all guard weight values equal when your detectors have meaningfully different precision — equal weights feed aggregate_confidence a flat average, erasing the signal from your most reliable classifier and allowing a high-confidence LLM-as-judge result to be diluted by a poorly calibrated experimental detector below the chain_threshold.
✗Don't omit the short_circuit_threshold check inside the priority loop — skipping the early-return branch in run_guard_chain forces every input, including blatant injections, through the full chain and into aggregate_confidence, adding unnecessary latency and giving subsequent guards a chance to lower the weighted score below chain_threshold when the first guard already had conclusive evidence to block.

Keep going with GenAI Security Engineering

Create a free account to track your progress and open this lesson in the full learning view. Subscribe to unlock the entire path — every goal, the hands-on labs, quizzes, and your verifiable skill graph — from . Cancel anytime.

Create a free account Subscribe — →