KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Latent Diffusion Model
Diffusion architecture that operates in a lower-dimensional latent space, obtained via an auto-encoder, to significantly reduce computational costs while maintaining high image generation quality.
Perceptual Encoder
Part of the auto-encoder in an LDM that transforms a high-dimensional image (pixels) into a low-dimensional representation (latent), capturing essential semantic information.
Cross-Attention Conditioning
Attention mechanism that allows the latent diffusion model to integrate heterogeneous information, such as text (CLIP embeddings), to guide image generation in a flexible and precise manner.
Noise Scheduler
Algorithm defining the variance of noise added at each timestep of the forward process, influencing the convergence speed and final generation quality in LDMs.
Noise Regression (Denoising)
Main task of the U-Net diffusion model, which consists of predicting the noise added to a latent at a given timestep, allowing it to be subtracted to progressively denoise the signal.
Hierarchical U-Net
Neural network architecture in the shape of a U, with residual connections and attention mechanisms, used as the core of the diffusion model to predict noise at each denoising step.
Classifier-Free Guidance (CFG)
Conditioning method that uses the gradient of the model's own log-probability to increase adherence to the prompt, avoiding the need for an external classifier and improving text fidelity.
Stable Diffusion
Famous open-source implementation of the latent diffusion model architecture, combining a VAE, a U-Net, and text conditioning via CLIP for accessible and performant image generation.
Stochastic Score Matching (SDE)
Alternative theoretical framework for diffusion models, interpreting them as solving a stochastic differential equation to learn the data density gradient (score).
Latent Resampling
Inference technique that dynamically modifies the denoising trajectory in latent space to improve coherence and generation quality, by adjusting time steps or guidance.
Time Distillation
Model compression process where a large, slow diffusion model is used to train a smaller, faster model capable of generating comparable quality images in fewer denoising steps.
Consistent Denoising
Family of inference methods that solve an ordinary differential equation (ODE) to approximate the denoising process, enabling high-quality generation in a single step or very few steps.
Prompt Tokenization
Preprocessing step where input text is converted into a sequence of numerical identifiers (tokens) that are then transformed into embeddings by the language model (e.g., CLIP) for conditioning.
KL Reconstruction Loss
Regularization term in the training of an LDM's VAE, measuring the Kullback-Leibler divergence between the learned latent distribution and a prior distribution (typically a standard Gaussian).
Textual Embedding Space
High-dimensional vector space where texts (prompts) are represented as embeddings, serving as conditioning to the diffusion model via cross-attention mechanism.