AI Podcast

119 episodes — 90-second audio overviews on ai podcast.

1:45

LLM layers — architecture of a large language model

A large language model is a deep stack of identical Transformer layers: early layers capture grammar, middle layers grasp semantics, and deep layers handle reasoning and world knowledge.

Large Language ModelsTransformersAI ArchitectureGenerative AI2026-02-21

1:55

ControlNet — adding spatial conditioning

Injecting structural control signals (edge maps, human poses, depth maps) alongside text prompts for precise spatial layout control over the generated image.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

1:36

Classifier-free guidance (CFG) — controlling prompt adherence

Blending conditional (text-guided) and unconditional predictions during generation; higher CFG values follow the text prompt more strictly at the cost of diversity.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

1:35

CLIP guidance — text-image alignment for generation

OpenAI's CLIP model provides a shared text-image embedding space that steers the diffusion process toward images matching a text description.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

1:53

Diffusion Transformers (DiT) — replacing U-Net with transformers

Using transformer blocks instead of U-Net for the denoising network — powers Sora, Flux, and SD3, offering better scaling and quality at large sizes.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

1:43

Latent diffusion — diffusing in compressed space

Running the diffusion process in a VAE's latent space (64x smaller than pixel space) rather than on raw pixels, making generation fast and memory-efficient.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

1:19

U-Net — the denoising backbone

An encoder-decoder convolutional network with skip connections that predicts the noise to remove at each diffusion step — the workhorse architecture of Stable Diffusion 1.x and 2.x.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

2:04

Noise schedules — controlling how noise is added

Linear, cosine, or learned schedules define how much noise is injected at each of the T timesteps — directly impacting generation quality and training stability.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

1:15

The diffusion process — forward noise, reverse denoise

Forward process: gradually add Gaussian noise over many steps until the image becomes pure static. Reverse process: learn to undo each step, recovering a clean image.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

1:58

Why diffusion won — comparing generative architectures

Diffusion models offer stable training, mode coverage, better diversity, and higher fidelity than GANs, which is why they replaced GANs as the dominant approach for image and video generation.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19

1:40

Normalizing Flows — invertible generation with exact likelihoods

Chains of invertible mathematical transformations that map simple distributions to complex ones, offering exact probability computation unlike GANs or VAEs.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19

1:55

GAN challenges — mode collapse and training instability

GANs are notoriously difficult to train: the generator may produce limited variety (mode collapse), and the adversarial balance is fragile and sensitive to hyperparameters.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19

1:25

GAN applications — StyleGAN, deepfakes, super-resolution

GANs powered photorealistic face generation (StyleGAN), image enhancement (ESRGAN), and synthetic media — the dominant GenAI paradigm before diffusion.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19

1:19

GANs — generator vs discriminator competition

Two networks in adversarial training: a generator creates fakes, a discriminator detects them — the competition drives both to improve, producing increasingly realistic outputs.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19

1:39

Variational Autoencoders (VAEs) — generating from learned distributions

Unlike basic autoencoders, VAEs encode inputs as probability distributions, enabling smooth interpolation between examples and sampling of entirely new outputs.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19

1:40

Latent space — the compressed world where generation happens

The bottleneck layer in an autoencoder where high-dimensional data (images, text) is compressed into a dense, navigable, lower-dimensional representation.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19

1:38

Autoencoders — compressing and reconstructing data

Neural networks that learn to encode input into a compact bottleneck representation and decode it back — the architectural foundation of latent space.

Image GenerationAI BasicsGenerative AIGenAI Explained2026-02-19

1:18

Coding benchmarks — HumanEval, SWE-bench, MBPP

Standard evaluations measuring code generation quality: from simple function completion (HumanEval) to resolving real GitHub issues (SWE-bench).

AI Code GenerationPrompt EngineeringGenerative AIGenAI Explained2026-02-19

1:29

Repository-level code understanding — beyond single files

Models that navigate imports, call graphs, type systems, and project structure to generate contextually correct changes spanning multiple files.

AI Code GenerationPrompt EngineeringGenerative AIGenAI Explained2026-02-19

1:37

Code execution feedback — running code to self-correct

Agents that generate code, execute it in a sandbox, read error messages, and iteratively fix bugs until all tests pass — closing the generate-test loop.

AI Code GenerationPrompt EngineeringGenerative AIGenAI Explained2026-02-19

1:37

Code generation from natural language — describing what you want

Translating English descriptions into working functions, classes, and scripts — the core use case driving AI-assisted software development.

AI Code GenerationPrompt EngineeringGenerative AIGenAI Explained2026-02-19

1:23

Fill-in-the-middle (FIM) — bidirectional code completion

Training models to predict missing code given both the prefix and suffix context, powering the inline autocomplete experience in editors like Copilot and Cursor.

AI Code GenerationPrompt EngineeringGenerative AIGenAI Explained2026-02-19

1:35

Code LLMs — models specialized for programming

Codex, CodeLlama, StarCoder, DeepSeek Coder — models trained on massive code corpora that understand syntax, APIs, libraries, and programming patterns.

AI Code GenerationPrompt EngineeringGenerative AIGenAI Explained2026-02-19

1:20

Meta-prompting — LLMs writing better prompts

Using one LLM to generate, evaluate, and iteratively optimize prompts for another model, automating the prompt engineering process itself.