KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Network Pruning
Method consisting of selectively removing the least important weights or neurons from a diffusion model, creating a sparser and more efficient architecture with minimal impact on performance.
Classifier-Guided Denoising
Optimization strategy that uses an external classification model to guide the denoising process, allowing equivalent visual quality to be achieved with fewer computationally expensive denoising steps.
Low-Rank Inference
Approach that approximates the model's large weight matrices by products of lower-rank matrices, drastically reducing the number of parameters and matrix multiplication operations during inference.
Accelerator Method
Set of techniques aimed at accelerating the diffusion process by skipping intermediate denoising steps, often using regression models to directly predict future steps.
Memory Optimization by Gradient Checkpointing
Memory management technique that selectively saves intermediate activations during backpropagation, recalculating them as needed to trade reduced RAM usage for a slight increase in computation time.
Mixture of Experts (MoE)
Model architecture where multiple 'experts' (sub-networks) are conditionally activated, allowing for increased model capacity without proportional increases in computational costs for a single inference.
Time-step Distillation
Form of distillation where a student model learns to generate high-quality results using fewer denoising steps than the teacher model, thus directly accelerating the generation process.
Efficient Stochastic Reparameterization
Optimization of noising and denoising that uses reparameterized parameters to reduce variance and the number of samples needed, making each diffusion step more stable and less costly.
Feature Caching
Strategy for caching intermediate feature maps for recurrent input conditions (e.g., text), avoiding their recalculation at each denoising step and thus reducing the overall computational load.
Deployment on Tensor Processing Unit (TPU)
Adaptation of diffusion model architectures to leverage the massively parallel matrix operations of TPUs, optimizing data flows and computation kernels for very high-speed inference.
Quality-Speed Trade-off by Scheduler
Use of different noise schedulers (e.g., DDIM, DPM-Solver) that allow controlling the number of denoising steps, offering fine-tuning between image quality and generation speed.
Convolution Kernel Fusion
Optimization technique that combines successive convolution layers (e.g., Conv + BatchNorm + ReLU) into a single convolution operation, reducing latency and memory access on inference hardware.
Consistency Latent Diffusion Model
Variant of a model trained to map any point on the noise trajectory directly to the data origin, enabling generation in a single step or very few steps, revolutionizing computational efficiency.
Hyperparameter Grid Search Optimization
Process of systematically exploring hyperparameter configurations (e.g., learning rate, number of attention heads) to identify the most performant model in terms of quality/computational cost ratio.
Asynchronous Pipeline Inference
Deployment architecture where denoising steps are processed in parallel on different computing units, masking latency and increasing processing throughput for real-time diffusion applications.