Latent Diffusion Models

📖

Begriffe

Latent Diffusion Model

Diffusion architecture that operates in a lower-dimensional latent space, obtained via an auto-encoder, to significantly reduce computational costs while maintaining high image generation quality.

📖

Begriffe

Perceptual Encoder

Part of the auto-encoder in an LDM that transforms a high-dimensional image (pixels) into a low-dimensional representation (latent), capturing essential semantic information.

📖

Begriffe

Cross-Attention Conditioning

Attention mechanism that allows the latent diffusion model to integrate heterogeneous information, such as text (CLIP embeddings), to guide image generation in a flexible and precise manner.

📖

Begriffe

Noise Scheduler

Algorithm defining the variance of noise added at each timestep of the forward process, influencing the convergence speed and final generation quality in LDMs.

📖

Begriffe

Noise Regression (Denoising)

Main task of the U-Net diffusion model, which consists of predicting the noise added to a latent at a given timestep, allowing it to be subtracted to progressively denoise the signal.

📖

Begriffe

Hierarchical U-Net

Neural network architecture in the shape of a U, with residual connections and attention mechanisms, used as the core of the diffusion model to predict noise at each denoising step.

📖

Begriffe

Classifier-Free Guidance (CFG)

Conditioning method that uses the gradient of the model's own log-probability to increase adherence to the prompt, avoiding the need for an external classifier and improving text fidelity.

📖

Begriffe

Stable Diffusion

Famous open-source implementation of the latent diffusion model architecture, combining a VAE, a U-Net, and text conditioning via CLIP for accessible and performant image generation.

📖

Begriffe

Stochastic Score Matching (SDE)

Alternative theoretical framework for diffusion models, interpreting them as solving a stochastic differential equation to learn the data density gradient (score).

📖

Begriffe

Latent Resampling

Inference technique that dynamically modifies the denoising trajectory in latent space to improve coherence and generation quality, by adjusting time steps or guidance.

📖

Begriffe

Time Distillation

Model compression process where a large, slow diffusion model is used to train a smaller, faster model capable of generating comparable quality images in fewer denoising steps.

📖

Begriffe

Consistent Denoising

Family of inference methods that solve an ordinary differential equation (ODE) to approximate the denoising process, enabling high-quality generation in a single step or very few steps.

📖

Begriffe

Prompt Tokenization

Preprocessing step where input text is converted into a sequence of numerical identifiers (tokens) that are then transformed into embeddings by the language model (e.g., CLIP) for conditioning.

📖

Begriffe

KL Reconstruction Loss

Regularization term in the training of an LDM's VAE, measuring the Kullback-Leibler divergence between the learned latent distribution and a prior distribution (typically a standard Gaussian).

📖

Begriffe

Textual Embedding Space

High-dimensional vector space where texts (prompts) are represented as embeddings, serving as conditioning to the diffusion model via cross-attention mechanism.

KI-Glossar