All Topics

Generative AI

119 episodes — 90-second audio overviews on generative ai.

LLM layers — architecture of a large language model
1:45

LLM layers — architecture of a large language model

A large language model is a deep stack of identical Transformer layers: early layers capture grammar, middle layers grasp semantics, and deep layers handle reasoning and world knowledge.

Large Language ModelsTransformersAI ArchitectureGenerative AI2026-02-21
ControlNet — adding spatial conditioning
1:55

ControlNet — adding spatial conditioning

Injecting structural control signals (edge maps, human poses, depth maps) alongside text prompts for precise spatial layout control over the generated image.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19
Classifier-free guidance (CFG) — controlling prompt adherence
1:36

Classifier-free guidance (CFG) — controlling prompt adherence

Blending conditional (text-guided) and unconditional predictions during generation; higher CFG values follow the text prompt more strictly at the cost of diversity.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19
CLIP guidance — text-image alignment for generation
1:35

CLIP guidance — text-image alignment for generation

OpenAI's CLIP model provides a shared text-image embedding space that steers the diffusion process toward images matching a text description.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19
Diffusion Transformers (DiT) — replacing U-Net with transformers
1:53

Diffusion Transformers (DiT) — replacing U-Net with transformers

Using transformer blocks instead of U-Net for the denoising network — powers Sora, Flux, and SD3, offering better scaling and quality at large sizes.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19
Latent diffusion — diffusing in compressed space
1:43

Latent diffusion — diffusing in compressed space

Running the diffusion process in a VAE's latent space (64x smaller than pixel space) rather than on raw pixels, making generation fast and memory-efficient.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19
U-Net — the denoising backbone
1:19

U-Net — the denoising backbone

An encoder-decoder convolutional network with skip connections that predicts the noise to remove at each diffusion step — the workhorse architecture of Stable Diffusion 1.x and 2.x.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19
Noise schedules — controlling how noise is added
2:04

Noise schedules — controlling how noise is added

Linear, cosine, or learned schedules define how much noise is injected at each of the T timesteps — directly impacting generation quality and training stability.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19
The diffusion process — forward noise, reverse denoise
1:15

The diffusion process — forward noise, reverse denoise

Forward process: gradually add Gaussian noise over many steps until the image becomes pure static. Reverse process: learn to undo each step, recovering a clean image.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19
Why diffusion won — comparing generative architectures
1:58

Why diffusion won — comparing generative architectures

Diffusion models offer stable training, mode coverage, better diversity, and higher fidelity than GANs, which is why they replaced GANs as the dominant approach for image and video generation.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19
Normalizing Flows — invertible generation with exact likelihoods
1:40

Normalizing Flows — invertible generation with exact likelihoods

Chains of invertible mathematical transformations that map simple distributions to complex ones, offering exact probability computation unlike GANs or VAEs.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19
GAN challenges — mode collapse and training instability
1:55

GAN challenges — mode collapse and training instability

GANs are notoriously difficult to train: the generator may produce limited variety (mode collapse), and the adversarial balance is fragile and sensitive to hyperparameters.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19
GAN applications — StyleGAN, deepfakes, super-resolution
1:25

GAN applications — StyleGAN, deepfakes, super-resolution

GANs powered photorealistic face generation (StyleGAN), image enhancement (ESRGAN), and synthetic media — the dominant GenAI paradigm before diffusion.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19
GANs — generator vs discriminator competition
1:19

GANs — generator vs discriminator competition

Two networks in adversarial training: a generator creates fakes, a discriminator detects them — the competition drives both to improve, producing increasingly realistic outputs.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19
Variational Autoencoders (VAEs) — generating from learned distributions
1:39

Variational Autoencoders (VAEs) — generating from learned distributions

Unlike basic autoencoders, VAEs encode inputs as probability distributions, enabling smooth interpolation between examples and sampling of entirely new outputs.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19
Latent space — the compressed world where generation happens
1:40

Latent space — the compressed world where generation happens

The bottleneck layer in an autoencoder where high-dimensional data (images, text) is compressed into a dense, navigable, lower-dimensional representation.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19
Autoencoders — compressing and reconstructing data
1:38

Autoencoders — compressing and reconstructing data

Neural networks that learn to encode input into a compact bottleneck representation and decode it back — the architectural foundation of latent space.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19
Coding benchmarks — HumanEval, SWE-bench, MBPP
1:18

Coding benchmarks — HumanEval, SWE-bench, MBPP

Standard evaluations measuring code generation quality: from simple function completion (HumanEval) to resolving real GitHub issues (SWE-bench).

AI Code GenerationPrompt EngineeringGenerative AIGenAI Explained2026-02-19
Repository-level code understanding — beyond single files
1:29

Repository-level code understanding — beyond single files

Models that navigate imports, call graphs, type systems, and project structure to generate contextually correct changes spanning multiple files.

AI Code GenerationPrompt EngineeringGenerative AIGenAI Explained2026-02-19
Code execution feedback — running code to self-correct
1:37

Code execution feedback — running code to self-correct

Agents that generate code, execute it in a sandbox, read error messages, and iteratively fix bugs until all tests pass — closing the generate-test loop.

AI Code GenerationPrompt EngineeringGenerative AIGenAI Explained2026-02-19
Code generation from natural language — describing what you want
1:37

Code generation from natural language — describing what you want

Translating English descriptions into working functions, classes, and scripts — the core use case driving AI-assisted software development.

AI Code GenerationPrompt EngineeringGenerative AIGenAI Explained2026-02-19
Fill-in-the-middle (FIM) — bidirectional code completion
1:23

Fill-in-the-middle (FIM) — bidirectional code completion

Training models to predict missing code given both the prefix and suffix context, powering the inline autocomplete experience in editors like Copilot and Cursor.

AI Code GenerationPrompt EngineeringGenerative AIGenAI Explained2026-02-19
Code LLMs — models specialized for programming
1:35

Code LLMs — models specialized for programming

Codex, CodeLlama, StarCoder, DeepSeek Coder — models trained on massive code corpora that understand syntax, APIs, libraries, and programming patterns.

AI Code GenerationPrompt EngineeringGenerative AIGenAI Explained2026-02-19
Meta-prompting — LLMs writing better prompts
1:20

Meta-prompting — LLMs writing better prompts

Using one LLM to generate, evaluate, and iteratively optimize prompts for another model, automating the prompt engineering process itself.

Prompt EngineeringGenerative AIGenAI ExplainedAI Podcast2026-02-19
Prompt chaining — multi-step workflows across prompts
1:40

Prompt chaining — multi-step workflows across prompts

Decomposing complex tasks into sequential prompt calls where each step's output feeds as context into the next step's input.

Prompt EngineeringGenerative AIGenAI ExplainedAI Podcast2026-02-19
Structured output prompting — JSON and schema-constrained generation
1:19

Structured output prompting — JSON and schema-constrained generation

Techniques and instructions that force LLM output into machine-parseable formats for reliable downstream integration with software systems.

Prompt EngineeringGenerative AIGenAI ExplainedAI Podcast2026-02-19
Tree of Thoughts — branching solution exploration
1:24

Tree of Thoughts — branching solution exploration

The model generates multiple reasoning paths, evaluates each branch, and prunes bad directions — systematic search over the space of possible solutions.

Prompt EngineeringGenerative AIGenAI ExplainedAI Podcast2026-02-19
ReAct — interleaving reasoning with action
1:32

ReAct — interleaving reasoning with action

A prompting framework where the model alternates between thinking about what to do (Reason), taking actions (tool calls), and processing observations.

Prompt EngineeringGenerative AIGenAI ExplainedAI Podcast2026-02-19
Chain-of-thought (CoT) — step-by-step reasoning
1:27

Chain-of-thought (CoT) — step-by-step reasoning

Adding "Let's think step by step" or showing worked reasoning dramatically improves accuracy on math, logic, and multi-step problems.

Prompt EngineeringGenerative AIGenAI ExplainedAI Podcast2026-02-19
Zero-shot prompting — instructions without examples
1:40

Zero-shot prompting — instructions without examples

Relying entirely on the model's pre-trained knowledge and instruction tuning by providing only a clear, specific task description.

Prompt EngineeringGenerative AIGenAI ExplainedAI Podcast2026-02-19
Few-shot prompting — teaching by example in context
1:16

Few-shot prompting — teaching by example in context

Including 2-5 input/output examples directly in the prompt so the model infers the desired pattern and applies it to new inputs without any training.

Prompt EngineeringGenerative AIGenAI ExplainedAI Podcast2026-02-19
System prompts — persistent behavioral instructions
1:29

System prompts — persistent behavioral instructions

Hidden instructions prepended to every conversation turn that define persona, rules, output format, tool access, and behavioral boundaries.

Prompt EngineeringGenerative AIGenAI ExplainedAI Podcast2026-02-19
Prompt engineering — designing inputs for desired outputs
1:19

Prompt engineering — designing inputs for desired outputs

The practice of crafting structured prompts that reliably guide LLMs to produce accurate, well-formatted, and useful responses.

Prompt EngineeringGenerative AIGenAI ExplainedAI Podcast2026-02-19
Streaming & SSE — delivering tokens as they generate
1:37

Streaming & SSE — delivering tokens as they generate

Server-Sent Events push each token to the client immediately as it's produced, creating the live typing experience users expect from chat interfaces.

AI InferenceGenerative AIGenAI ExplainedAI Podcast2026-02-19
Structured decoding — constraining output to valid formats
1:48

Structured decoding — constraining output to valid formats

Grammar-based or JSON-schema-based constraints that guarantee output is syntactically valid JSON, SQL, XML, or other structured formats.

AI InferenceGenerative AIGenAI ExplainedAI Podcast2026-02-19
Logit bias — steering toward or away from specific tokens
1:32

Logit bias — steering toward or away from specific tokens

Manually adjusting individual token log-probabilities before sampling to encourage or suppress particular words, formats, or languages.

AI InferenceGenerative AIGenAI ExplainedAI Podcast2026-02-19
Stop sequences — controlling when generation halts
1:38

Stop sequences — controlling when generation halts

Defined strings or token IDs that trigger immediate generation termination, giving precise programmatic control over output boundaries.

AI InferenceGenerative AIGenAI ExplainedAI Podcast2026-02-19
Repetition penalty — preventing degenerate loops
1:38

Repetition penalty — preventing degenerate loops

Reducing the probability of recently generated tokens to avoid the repetitive patterns that plague naive decoding strategies.

AI InferenceGenerative AIGenAI ExplainedAI Podcast2026-02-19
Beam search — exploring multiple generation paths
1:22

Beam search — exploring multiple generation paths

Maintaining the k highest-probability partial sequences at each step; produces higher-likelihood outputs but less diverse text than sampling.

AI InferenceGenerative AIGenAI ExplainedAI Podcast2026-02-19
Top-p (nucleus) sampling — dynamic probability cutoff
1:45

Top-p (nucleus) sampling — dynamic probability cutoff

Tokens are included in the candidate set until their cumulative probability reaches threshold p, adapting the pool size to the model's confidence.

AI InferenceGenerative AIGenAI ExplainedAI Podcast2026-02-19
Top-k sampling — fixed-size candidate filtering
1:18

Top-k sampling — fixed-size candidate filtering

Only the k most probable next tokens are considered before sampling, filtering out the long tail of unlikely noise.

AI InferenceGenerative AIGenAI ExplainedAI Podcast2026-02-19
Temperature — controlling randomness in generation
1:40

Temperature — controlling randomness in generation

A scaling factor applied to logits before softmax: temperature=0 always picks the top token (greedy), higher values spread probability across more candidates.

AI InferenceGenerative AIGenAI ExplainedAI Podcast2026-02-19
Autoregressive decoding — generating one token at a time
1:23

Autoregressive decoding — generating one token at a time

The model produces tokens sequentially, each conditioned on all previous tokens, until hitting a stop token or length limit.

AI InferenceGenerative AIGenAI ExplainedAI Podcast2026-02-19
Overrefusal — when safety makes models too cautious
1:26

Overrefusal — when safety makes models too cautious

Excessive safety training causes refusal of clearly benign requests; calibrating the refusal boundary without compromising safety is a key alignment challenge.

AI SafetyAI AlignmentGenerative AIGenAI Explained2026-02-19
Hallucination mitigation — grounding, retrieval, verification
1:37

Hallucination mitigation — grounding, retrieval, verification

RAG, self-consistency checks, citation requirements, confidence calibration, and retrieval verification reduce but never fully eliminate hallucination.

AI SafetyAI AlignmentGenerative AIGenAI Explained2026-02-19
Why hallucinations happen — probability meets knowledge gaps
1:45

Why hallucinations happen — probability meets knowledge gaps

Models assign probability to all possible tokens including wrong ones; gaps in training data and distributional shift make some fabrication inevitable.

AI SafetyAI AlignmentGenerative AIGenAI Explained2026-02-19
Types of hallucination — intrinsic vs extrinsic
1:53

Types of hallucination — intrinsic vs extrinsic

Intrinsic hallucinations contradict the provided input; extrinsic hallucinations add unsupported claims from parametric memory — both undermine user trust.

AI SafetyAI AlignmentGenerative AIGenAI Explained2026-02-19
Hallucination — when GenAI confidently fabricates information
1:16

Hallucination — when GenAI confidently fabricates information

Models generate plausible but factually wrong content because they optimize for fluency and pattern completion, not truth or accuracy.

AI SafetyAI AlignmentGenerative AIGenAI Explained2026-02-19
The alignment tax — capability cost of safety training
1:30

The alignment tax — capability cost of safety training

Safety training can sometimes reduce raw benchmark performance; minimizing this tax while maintaining strong alignment is an active area of research.

AI AlignmentFine-TuningGenerative AIGenAI Explained2026-02-19
Reward hacking — when models game the reward signal
1:57

Reward hacking — when models game the reward signal

Models can learn to exploit reward model weaknesses — producing verbose, sycophantic, or superficially impressive responses rather than genuinely better ones.

AI AlignmentFine-TuningGenerative AIGenAI Explained2026-02-19
RLAIF — AI feedback replacing human feedback
1:32

RLAIF — AI feedback replacing human feedback

Using a stronger AI model to generate preference labels instead of humans, scaling the alignment data pipeline far beyond human annotation capacity.

AI AlignmentFine-TuningGenerative AIGenAI Explained2026-02-19
Constitutional AI — self-supervised alignment via principles
1:32

Constitutional AI — self-supervised alignment via principles

The model critiques and revises its own outputs against a written set of principles, dramatically reducing dependence on expensive human labels.

AI AlignmentFine-TuningGenerative AIGenAI Explained2026-02-19
DPO — Direct Preference Optimization
1:23

DPO — Direct Preference Optimization

A simpler alternative to RLHF that eliminates the reward model, directly optimizing the LLM on human preference pairs — more stable and increasingly preferred.

AI AlignmentFine-TuningGenerative AIGenAI Explained2026-02-19
Reward modeling — learning human preferences at scale
1:26

Reward modeling — learning human preferences at scale

A separate neural network trained to score any model output by quality, serving as a scalable automated proxy for human judgment.

AI AlignmentFine-TuningGenerative AIGenAI Explained2026-02-19
RLHF — reinforcement learning from human feedback
1:42

RLHF — reinforcement learning from human feedback

Humans rank model outputs by quality; a reward model learns those preferences; the LLM is then optimized to maximize the learned reward signal.

AI AlignmentFine-TuningGenerative AIGenAI Explained2026-02-19
What is alignment — helpful, harmless, honest
1:25

What is alignment — helpful, harmless, honest

The discipline of ensuring AI systems behave according to human values and intentions, not just optimize for raw capability on benchmarks.

AI AlignmentFine-TuningGenerative AIGenAI Explained2026-02-19
When to fine-tune vs when to prompt
1:31

When to fine-tune vs when to prompt

Fine-tune when you need consistent style, format, or domain knowledge at scale with low latency; prompt when you need flexibility, rapid iteration, and have limited data.

Fine-TuningAI TrainingGenerative AIGenAI Explained2026-02-19
Model merging — combining models without training
1:26

Model merging — combining models without training

SLERP, TIES, DARE, and linear methods that blend weights from multiple fine-tuned models, often producing surprisingly capable hybrids at zero training cost.

Fine-TuningAI TrainingGenerative AIGenAI Explained2026-02-19
Catastrophic forgetting — when fine-tuning erases prior knowledge
1:48

Catastrophic forgetting — when fine-tuning erases prior knowledge

Training too aggressively on narrow data destroys general capabilities the base model had — low learning rates and regularization are key defenses.

Fine-TuningAI TrainingGenerative AIGenAI Explained2026-02-19
Instruction datasets — the data behind helpful assistants
1:25

Instruction datasets — the data behind helpful assistants

Datasets like FLAN, Alpaca, OpenAssistant, UltraChat, and ShareGPT that teach models the fundamental pattern of following human instructions.

Fine-TuningAI TrainingGenerative AIGenAI Explained2026-02-19
PEFT methods — the parameter-efficient fine-tuning family
1:41

PEFT methods — the parameter-efficient fine-tuning family

LoRA, prefix tuning, prompt tuning, IA³, and adapters — techniques that modify less than 1% of parameters while preserving base model quality.

Fine-TuningAI TrainingGenerative AIGenAI Explained2026-02-19
QLoRA — fine-tuning on consumer hardware
2:03

QLoRA — fine-tuning on consumer hardware

Combining 4-bit weight quantization with LoRA adapters makes it feasible to fine-tune a 70B-parameter model on a single 48GB consumer GPU.

Fine-TuningAI TrainingGenerative AIGenAI Explained2026-02-19
LoRA — low-rank adaptation for efficient fine-tuning
1:33

LoRA — low-rank adaptation for efficient fine-tuning

Freezing original model weights and training small rank-decomposed adapter matrices reduces fine-tuning compute by 10-100x with minimal quality loss.

Fine-TuningAI TrainingGenerative AIGenAI Explained2026-02-19
Supervised Fine-Tuning (SFT) — teaching instruction following
1:34

Supervised Fine-Tuning (SFT) — teaching instruction following

Training on curated (instruction, response) pairs transforms a raw base model into an assistant that follows directions helpfully and accurately.

Fine-TuningAI TrainingGenerative AIGenAI Explained2026-02-19
DeepSpeed & FSDP — distributed training frameworks
1:39

DeepSpeed & FSDP — distributed training frameworks

Microsoft DeepSpeed (ZeRO stages 1-3) and PyTorch FSDP manage the complexity of sharding parameters, gradients, and optimizer states across clusters.

Distributed TrainingAI TrainingGenerative AIGenAI Explained2026-02-19
Pipeline parallelism — different layers on different GPUs
1:13

Pipeline parallelism — different layers on different GPUs

Layers 1-20 on GPU set A, layers 21-40 on GPU set B — combined with micro-batching to keep all devices busy.

Distributed TrainingAI TrainingGenerative AIGenAI Explained2026-02-19
Tensor parallelism — splitting individual layers across GPUs
1:28

Tensor parallelism — splitting individual layers across GPUs

Weight matrices are sharded across devices so each GPU computes a slice of every layer — required for models too large for one device's memory.

Distributed TrainingAI TrainingGenerative AIGenAI Explained2026-02-19
Data parallelism — same model, different data on each GPU
1:23

Data parallelism — same model, different data on each GPU

Every GPU holds a full copy of the model but processes different mini-batches; gradients are averaged across devices after each step.

Distributed TrainingAI TrainingGenerative AIGenAI Explained2026-02-19
Why distributed training — no single GPU is enough
1:52

Why distributed training — no single GPU is enough

Frontier model weights, activations, and optimizer states vastly exceed any single GPU's memory; training requires coordinating thousands of devices.

Distributed TrainingAI TrainingGenerative AIGenAI Explained2026-02-19
Training compute — measuring cost in FLOPs and GPU-hours
1:23

Training compute — measuring cost in FLOPs and GPU-hours

Frontier models cost $50-100M+ to train; understanding compute budgets frames what is feasible at different organizational scales.

AI TrainingLarge Language ModelsGenerative AIGenAI Explained2026-02-19
Continued pre-training — expanding a model's knowledge domain
1:44

Continued pre-training — expanding a model's knowledge domain

Adding large domain-specific corpora (medical, legal, financial, scientific) to a base model to deepen expertise before fine-tuning.

AI TrainingLarge Language ModelsGenerative AIGenAI Explained2026-02-18
Learning rate schedules — warming up and cooling down
1:22

Learning rate schedules — warming up and cooling down

Cosine decay with linear warmup is standard: gradually increase the learning rate at the start, then smoothly decrease it over the run.

AI TrainingLarge Language ModelsGenerative AIGenAI Explained2026-02-18
Training loss curves — reading the heartbeat of pre-training
1:44

Training loss curves — reading the heartbeat of pre-training

Smoothly decreasing loss means healthy training; spikes signal bad data batches, learning rate issues, or hardware failures.

AI TrainingLarge Language ModelsGenerative AIGenAI Explained2026-02-18
Chinchilla optimal — balancing parameters and tokens
1:39

Chinchilla optimal — balancing parameters and tokens

DeepMind's research showing that for a fixed compute budget, the optimal strategy scales data and parameters in roughly equal proportion.

AI TrainingLarge Language ModelsGenerative AIGenAI Explained2026-02-18
Data mixture & weighting — balancing domains during training
1:30

Data mixture & weighting — balancing domains during training

The ratio of code, math, science, conversation, and books in training data directly shapes which capabilities the finished model develops.

AI TrainingLarge Language ModelsGenerative AIGenAI Explained2026-02-18
Training data curation — filtering the internet for quality
1:35

Training data curation — filtering the internet for quality

Deduplication, toxicity filtering, domain balancing, quality scoring, and PII removal transform raw web crawls into effective training corpora.

AI TrainingLarge Language ModelsGenerative AIGenAI Explained2026-02-18
The pre-training recipe — data, compute, and objectives
1:23

The pre-training recipe — data, compute, and objectives

Curating trillions of tokens, allocating thousands of GPUs, and running next-token prediction for weeks to months at enormous cost.

AI TrainingLarge Language ModelsGenerative AIGenAI Explained2026-02-18
Scaling laws — predictable performance from compute investment
1:23

Scaling laws — predictable performance from compute investment

Chinchilla and Kaplan laws show that model quality improves as a smooth power law function of parameters, data, and compute budget.

Large Language ModelsGenerative AIGenAI ExplainedAI Podcast2026-02-18
Emergent abilities — capabilities that appear at scale
1:47

Emergent abilities — capabilities that appear at scale

Skills like in-context learning and multi-step reasoning that only manifest when models cross certain parameter/data thresholds.

Large Language ModelsGenerative AIGenAI ExplainedAI Podcast2026-02-18
Context window — the model's working memory
1:49

Context window — the model's working memory

The maximum tokens processable in a single forward pass; ranges from 4K to 1M+ and directly limits what the model can reason about per request.

Large Language ModelsGenerative AIGenAI ExplainedAI Podcast2026-02-18
Next-token prediction — the deceptively simple training objective
1:25

Next-token prediction — the deceptively simple training objective

Predicting the next token in a sequence: this single objective, applied at massive scale, produces reasoning, coding, and creative abilities.

Large Language ModelsGenerative AIGenAI ExplainedAI Podcast2026-02-18
Mistral & Mixtral — efficient European models
1:29

Mistral & Mixtral — efficient European models

Mistral 7B and Mixtral 8x7B demonstrated that smaller, well-architected models (especially MoE) punch far above their parameter count.

Large Language ModelsGenerative AIGenAI ExplainedAI Podcast2026-02-18
Gemini — Google's natively multimodal family
1:41

Gemini — Google's natively multimodal family

Trained from the ground up on text, images, audio, and video, processing all modalities in a unified transformer architecture.

Large Language ModelsGenerative AIGenAI ExplainedAI Podcast2026-02-18
Claude — Anthropic's safety-first model family
1:50

Claude — Anthropic's safety-first model family

Built with Constitutional AI and RLHF, emphasizing being helpful, harmless, and honest — proving alignment and capability can advance together.

Large Language ModelsGenerative AIGenAI ExplainedAI Podcast2026-02-18
GPT family — OpenAI's foundational lineage
1:29

GPT family — OpenAI's foundational lineage

From GPT-1 (117M params) to GPT-4 (rumored 1.8T MoE), the series that defined the modern LLM paradigm and launched the GenAI era.

Large Language ModelsGenerative AIGenAI ExplainedAI Podcast2026-02-18
What is an LLM — language models at billion-parameter scale
1:35

What is an LLM — language models at billion-parameter scale

Transformer decoders with billions of parameters trained on trillions of tokens, exhibiting broad language understanding and generation capabilities.

Large Language ModelsGenerative AIGenAI ExplainedAI Podcast2026-02-18
ALiBi & position extrapolation — extending context beyond training length
1:34

ALiBi & position extrapolation — extending context beyond training length

Adding position-dependent linear bias to attention scores, allowing models to handle sequences longer than their training context window.

Attention MechanismTransformersGenerative AIGenAI Explained2026-02-18
Rotary Position Embeddings (RoPE) — modern position encoding
1:18

Rotary Position Embeddings (RoPE) — modern position encoding

Encodes relative position by rotating Q/K vectors in pairs, enabling better generalization to sequence lengths not seen during training.

Attention MechanismTransformersGenerative AIGenAI Explained2026-02-18
Sliding window attention — local context for efficiency
1:58

Sliding window attention — local context for efficiency

Each token only attends to a fixed window of nearby tokens instead of the full sequence, reducing cost from O(n²) to O(n·w).

Attention MechanismTransformersGenerative AIGenAI Explained2026-02-18
Grouped-Query Attention (GQA) — the practical middle ground
1:46

Grouped-Query Attention (GQA) — the practical middle ground

Groups of heads share K/V projections (e.g., 8 groups for 32 heads), balancing quality retention with efficiency — the default in LLaMA 3 and Mistral.

Attention MechanismTransformersGenerative AIGenAI Explained2026-02-18
Multi-Query Attention (MQA) — sharing K/V across all heads
1:49

Multi-Query Attention (MQA) — sharing K/V across all heads

All attention heads share a single set of key/value projections, dramatically reducing KV cache memory and boosting inference speed.

Attention MechanismTransformersGenerative AIGenAI Explained2026-02-18
SwiGLU & modern activations — inside frontier transformers
1:28

SwiGLU & modern activations — inside frontier transformers

SwiGLU replaces older ReLU in modern transformers (LLaMA, Mistral), providing smoother gradients and measurably better training dynamics.

TransformersAI BasicsGenerative AIGenAI Explained2026-02-18
The attention bottleneck — O(n²) cost of full attention
1:18

The attention bottleneck — O(n²) cost of full attention

Attention scales quadratically with sequence length; a 100K-token input requires 10 billion attention pair computations per layer.

TransformersAI BasicsGenerative AIGenAI Explained2026-02-18
Causal masking — why decoders can't peek ahead
1:10

Causal masking — why decoders can't peek ahead

Future tokens are masked during training so each position only attends to past tokens, enabling left-to-right autoregressive generation.

TransformersAI BasicsGenerative AIGenAI Explained2026-02-18
Encoder vs decoder vs encoder-decoder
1:35

Encoder vs decoder vs encoder-decoder

BERT uses an encoder (understanding), GPT uses a decoder (generation), T5 uses both — different configurations optimized for different GenAI tasks.

TransformersAI BasicsGenerative AIGenAI Explained2026-02-18
Residual connections & layer norm — stability for deep models
1:30

Residual connections & layer norm — stability for deep models

Skip connections add each sub-layer's input to its output, and normalization prevents values from exploding, enabling stable 100+ layer training.

TransformersAI BasicsGenerative AIGenAI Explained2026-02-18
Feed-forward networks — per-token transformation after attention
1:49

Feed-forward networks — per-token transformation after attention

After attention mixes information across tokens, independent feed-forward layers transform each token's representation with nonlinear activation functions.

TransformersAI BasicsGenerative AIGenAI Explained2026-02-18
Multi-head attention — parallel perspectives on the same input
1:41

Multi-head attention — parallel perspectives on the same input

Multiple attention mechanisms run simultaneously, each learning to capture different relationship types like syntax, semantics, and coreference.

TransformersAI BasicsGenerative AIGenAI Explained2026-02-18
Query, Key, Value — the three vectors of attention
1:28

Query, Key, Value — the three vectors of attention

Tokens generate Q, K, V projections; attention scores come from Q·K dot-product similarity, and the output is V weighted by those scores.

TransformersAI BasicsGenerative AIGenAI Explained2026-02-18
Self-attention — every token looks at every other
1:24

Self-attention — every token looks at every other

Each token computes relevance scores against all other tokens, capturing long-range dependencies in a single parallel computation step.

TransformersAI BasicsGenerative AIGenAI Explained2026-02-18
The Transformer — the engine of modern GenAI
1:37

The Transformer — the engine of modern GenAI

Published in 2017's "Attention Is All You Need," this architecture replaced recurrent networks and became the foundation of every frontier GenAI model.

TransformersAI BasicsGenerative AIGenAI Explained2026-02-18
Token economics — why every token has a price
1:52

Token economics — why every token has a price

API providers charge per input and output token; understanding tokenization directly impacts cost estimation, prompt design, and budget optimization.

AI BasicsAI TokenizationGenerative AIGenAI Explained2026-02-18
Positional encoding — teaching word order to parallel models
1:26

Positional encoding — teaching word order to parallel models

Since transformers process all tokens simultaneously, position must be explicitly injected via sinusoidal functions or learned embeddings.

AI BasicsAI TokenizationGenerative AIGenAI Explained2026-02-18
Word embeddings — turning tokens into vectors
1:37

Word embeddings — turning tokens into vectors

Each token maps to a learned high-dimensional vector where semantic proximity in space encodes similarity in meaning.

AI BasicsAI TokenizationGenerative AIGenAI Explained2026-02-18
Special tokens — control signals for models
1:38

Special tokens — control signals for models

\[BOS\], \[EOS\], \[PAD\], \<\|im\_start\|\>, \<tool\_call\> — reserved tokens that mark boundaries, roles, and structure for the model.

AI BasicsAI TokenizationGenerative AIGenAI Explained2026-02-18
Vocabulary size tradeoffs — why 32K, 50K, or 100K tokens
1:34

Vocabulary size tradeoffs — why 32K, 50K, or 100K tokens

Larger vocabularies produce fewer tokens per text (cheaper inference) but require bigger embedding tables and more parameters to train.

AI BasicsAI TokenizationGenerative AIGenAI Explained2026-02-18
SentencePiece & tiktoken — tokenizer implementations
1:54

SentencePiece & tiktoken — tokenizer implementations

SentencePiece (Google) and tiktoken (OpenAI) are the standard libraries for fast, language-agnostic tokenization used across model families.

AI BasicsAI TokenizationGenerative AIGenAI Explained2026-02-18
Byte-Pair Encoding (BPE) — how tokenizers learn to split text
1:33

Byte-Pair Encoding (BPE) — how tokenizers learn to split text

Starting from individual bytes or characters, BPE iteratively merges the most frequent adjacent pairs until reaching a target vocabulary size.

AI BasicsAI TokenizationGenerative AIGenAI Explained2026-02-18
What are tokens — the atoms of language models
1:21

What are tokens — the atoms of language models

Models don't see words or characters; they see tokens — subword units that balance vocabulary size with text coverage.

AI BasicsAI TokenizationGenerative AIGenAI Explained2026-02-18
GenAI timeline — from GPT-1 to today's frontier
1:32

GenAI timeline — from GPT-1 to today's frontier

A chronological tour: GPT-1 (2018), GPT-3 (2020), DALL-E (2021), ChatGPT (2022), GPT-4 and Claude (2023), multimodal omni models (2024-25).

AI BasicsGenerative AIGenAI ExplainedAI Podcast2026-02-18
The GenAI stack — hardware, models, orchestration, apps
1:24

The GenAI stack — hardware, models, orchestration, apps

From GPU clusters at the bottom to model weights to orchestration frameworks to end-user apps at the top — the full technology stack powering GenAI.

AI BasicsGenerative AIGenAI ExplainedAI Podcast2026-02-18
Closed vs open models — APIs vs downloadable weights
1:42

Closed vs open models — APIs vs downloadable weights

OpenAI and Anthropic offer API access; Meta and Mistral release weights — each path has different tradeoffs in cost, control, privacy, and customization.

AI BasicsGenerative AIGenAI ExplainedAI Podcast2026-02-18
Parameters — the learned numbers inside a model
1:36

Parameters — the learned numbers inside a model

Each parameter is a single number learned during training; modern GenAI models have billions, collectively encoding everything the model knows.

AI BasicsGenerative AIGenAI ExplainedAI Podcast2026-02-18
The training-inference split — building the brain vs using it
1:35

The training-inference split — building the brain vs using it

Training costs millions of dollars and takes weeks on thousands of GPUs; inference serves billions of requests cheaply — two fundamentally different engineering problems.

AI BasicsAI InferenceGenerative AIGenAI Explained2026-02-18
How GenAI generates — one token or step at a time
1:41

How GenAI generates — one token or step at a time

Text models predict the next token autoregressively; image models denoise step by step — both are iterative generation processes.

AI BasicsGenerative AIGenAI ExplainedAI Podcast2026-02-18
Foundation models — one model, many tasks
1:07

Foundation models — one model, many tasks

Massive models pre-trained on broad data that can be adapted to countless downstream tasks without retraining from scratch.

AI BasicsGenerative AIGenAI ExplainedAI Podcast2026-02-18
The GenAI modality map — text, image, audio, video, code, 3D
1:29

The GenAI modality map — text, image, audio, video, code, 3D

A survey of every output type GenAI can produce today and the distinct model families that power each modality.

AI BasicsGenerative AIGenAI ExplainedAI Podcast2026-02-18
How GenAI differs from traditional AI — generation vs classification
1:28

How GenAI differs from traditional AI — generation vs classification

Traditional ML sorts, ranks, and predicts from fixed categories; GenAI synthesizes novel outputs by sampling from a learned distribution of possibilities.

AI BasicsGenerative AIGenAI ExplainedAI Podcast2026-02-18
What is generative AI — models that create new content
1:18

What is generative AI — models that create new content

Unlike traditional AI that classifies or predicts, GenAI produces entirely new text, images, code, and audio from learned patterns.

AI BasicsGenerative AIGenAI ExplainedAI Podcast2026-02-18