🏠 Home
Benchmark Hub
📊 All Benchmarks 🦖 Dinosaur v1 🦖 Dinosaur v2 ✅ To-Do List Applications 🎨 Creative Free Pages 🎯 FSACB - Ultimate Showcase 🌍 Translation Benchmark
Models
🏆 Top 10 Models 🆓 Free Models 📋 All Models ⚙️ Kilo Code
Resources
💬 Prompts Library 📖 AI Glossary 🔗 Useful Links

AI Glossary

The complete dictionary of Artificial Intelligence

162
categories
2,032
subcategories
23,060
terms
📖
terms

Video Diffusion Model

Generation architecture that applies the diffusion process to spatio-temporal data, gradually adding noise to images in a video sequence before learning to denoise it to reconstruct or create coherent videos.

📖
terms

Spatio-Temporal Latent Diffusion

Variant of video diffusion models that operates in a compressed latent space, reducing computational complexity by applying the noise addition and denoising process on low-dimensional representations rather than on raw pixels of each frame.

📖
terms

3D Attention

Attention mechanism that simultaneously processes spatial (height, width) and temporal (time) dimensions of a video, allowing the model to weight the importance of different regions across different moments to capture spatio-temporal dependencies.

📖
terms

Time Embedding

Technique for encoding temporal information (diffusion step, position in the sequence) as vectors that are injected into the network, guiding the generation process to maintain consistency and movement over time.

📖
terms

Conditional Denoising by Previous Frame

Strategy where the noise prediction for a video frame is conditioned by the denoised version of the previous frame, ensuring strong continuity and temporal consistency between successive images in the generated sequence.

📖
terms

3D U-Net Architecture

Convolutional neural network structure adapted for video data, combining encoder-decoder paths with 3D residual connections to effectively capture multi-scale spatial and temporal contexts during denoising.

📖
terms

Spatio-Temporal Latent Space

Compressed and abstract representation of a video sequence, where spatial and temporal information is encoded in a low-dimensional vector or feature map, serving as the basis for video generation or manipulation.

📖
terms

Video Classifier-Free Guidance (CFG)

Method for controlling video generation without an explicit classifier, by training a model on both conditional (e.g., text) and unconditional data, then interpolating between their predictions to adjust adherence to the prompt while preserving diversity.

📖
terms

Temporal Diffusion Scheduling

Planning the number of denoising steps allocated to each frame or temporal segments, which can be uniform or adaptive to optimize the quality and overall consistency of the generated video based on motion complexity.

📖
terms

Diffusion-based Temporal Super-Resolution

Application of diffusion models to increase the frame rate (fps) of a video, generating coherent intermediate frames that realistically interpolate motion and changes between existing frames.

📖
terms

Video Inpainting by Diffusion

Process of filling missing or masked areas in a video sequence using a diffusion model, which generates pixels that are spatially and temporally coherent based on the context of surrounding frames.

📖
terms

Latent Motion Modeling

Technique where motion in a video is modeled and generated directly in the latent space, often by predicting displacements or transformations between latent codes of successive frames, before decoding them into images.

📖
terms

Temporal Consistency by Constraint

Approach that adds an explicit penalty or constraint in the model's loss function to discourage appearance changes (e.g., color, texture) unrelated to motion between adjacent frames, promoting visual stability.

📖
terms

Spatio-Temporal Noise Decomposition

Advanced method where the noise added and predicted by the model is decomposed into a spatial component (appearance) and a temporal component (motion), allowing finer control and more robust generation of dynamic videos.

📖
terms

Auto-Regression on Diffusion Latents

Hybrid strategy that generates a video auto-regressively frame by frame, where each latent frame is produced by a diffusion step conditioned on previous latent frames, combining the consistency of auto-regression and the quality of diffusion.

📖
terms

Temporal Feature Normalization

Normalization layer applied to the temporal dimension of feature maps in a 3D U-Net, stabilizing training by ensuring that the distribution of activations remains consistent across different temporal stages of the sequence.

🔍

No results found