Image Generation

16 episodes — 90-second audio overviews on image generation.

1:55

ControlNet — adding spatial conditioning

Injecting structural control signals (edge maps, human poses, depth maps) alongside text prompts for precise spatial layout control over the generated image.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

1:36

Classifier-free guidance (CFG) — controlling prompt adherence

Blending conditional (text-guided) and unconditional predictions during generation; higher CFG values follow the text prompt more strictly at the cost of diversity.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

1:35

CLIP guidance — text-image alignment for generation

OpenAI's CLIP model provides a shared text-image embedding space that steers the diffusion process toward images matching a text description.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

1:53

Diffusion Transformers (DiT) — replacing U-Net with transformers

Using transformer blocks instead of U-Net for the denoising network — powers Sora, Flux, and SD3, offering better scaling and quality at large sizes.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

1:43

Latent diffusion — diffusing in compressed space

Running the diffusion process in a VAE's latent space (64x smaller than pixel space) rather than on raw pixels, making generation fast and memory-efficient.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

1:19

U-Net — the denoising backbone

An encoder-decoder convolutional network with skip connections that predicts the noise to remove at each diffusion step — the workhorse architecture of Stable Diffusion 1.x and 2.x.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

2:04

Noise schedules — controlling how noise is added

Linear, cosine, or learned schedules define how much noise is injected at each of the T timesteps — directly impacting generation quality and training stability.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

1:15

The diffusion process — forward noise, reverse denoise

Forward process: gradually add Gaussian noise over many steps until the image becomes pure static. Reverse process: learn to undo each step, recovering a clean image.

Image GenerationGenerative AIGenAI ExplainedAI Podcast2026-02-19

1:58

Why diffusion won — comparing generative architectures

Diffusion models offer stable training, mode coverage, better diversity, and higher fidelity than GANs, which is why they replaced GANs as the dominant approach for image and video generation.