AI Alignment

13 episodes — 90-second audio overviews on ai alignment.

1:26

Overrefusal — when safety makes models too cautious

Excessive safety training causes refusal of clearly benign requests; calibrating the refusal boundary without compromising safety is a key alignment challenge.

AI SafetyAI AlignmentGenerative AIGenAI Explained2026-02-19

1:37

Hallucination mitigation — grounding, retrieval, verification

RAG, self-consistency checks, citation requirements, confidence calibration, and retrieval verification reduce but never fully eliminate hallucination.

AI SafetyAI AlignmentGenerative AIGenAI Explained2026-02-19

1:45

Why hallucinations happen — probability meets knowledge gaps

Models assign probability to all possible tokens including wrong ones; gaps in training data and distributional shift make some fabrication inevitable.

AI SafetyAI AlignmentGenerative AIGenAI Explained2026-02-19

1:53

Types of hallucination — intrinsic vs extrinsic

Intrinsic hallucinations contradict the provided input; extrinsic hallucinations add unsupported claims from parametric memory — both undermine user trust.

AI SafetyAI AlignmentGenerative AIGenAI Explained2026-02-19

1:16

Hallucination — when GenAI confidently fabricates information

Models generate plausible but factually wrong content because they optimize for fluency and pattern completion, not truth or accuracy.

AI SafetyAI AlignmentGenerative AIGenAI Explained2026-02-19

1:30

The alignment tax — capability cost of safety training

Safety training can sometimes reduce raw benchmark performance; minimizing this tax while maintaining strong alignment is an active area of research.