Video and Temporal Diffusion - 인공지능 용어집

📖

용어

Video Diffusion Model

Generation architecture that applies the diffusion process to spatio-temporal data, gradually adding noise to images in a video sequence before learning to denoise it to reconstruct or create coherent videos.

📖

용어

Spatio-Temporal Latent Diffusion

Variant of video diffusion models that operates in a compressed latent space, reducing computational complexity by applying the noise addition and denoising process on low-dimensional representations rather than on raw pixels of each frame.

📖

용어

3D Attention

Attention mechanism that simultaneously processes spatial (height, width) and temporal (time) dimensions of a video, allowing the model to weight the importance of different regions across different moments to capture spatio-temporal dependencies.

📖

용어

Time Embedding

Technique for encoding temporal information (diffusion step, position in the sequence) as vectors that are injected into the network, guiding the generation process to maintain consistency and movement over time.

📖

용어

Conditional Denoising by Previous Frame

Strategy where the noise prediction for a video frame is conditioned by the denoised version of the previous frame, ensuring strong continuity and temporal consistency between successive images in the generated sequence.

📖

용어

3D U-Net Architecture

Convolutional neural network structure adapted for video data, combining encoder-decoder paths with 3D residual connections to effectively capture multi-scale spatial and temporal contexts during denoising.

📖

용어

Spatio-Temporal Latent Space

Compressed and abstract representation of a video sequence, where spatial and temporal information is encoded in a low-dimensional vector or feature map, serving as the basis for video generation or manipulation.

📖

용어

Video Classifier-Free Guidance (CFG)

Method for controlling video generation without an explicit classifier, by training a model on both conditional (e.g., text) and unconditional data, then interpolating between their predictions to adjust adherence to the prompt while preserving diversity.

📖

용어

Temporal Diffusion Scheduling

Planning the number of denoising steps allocated to each frame or temporal segments, which can be uniform or adaptive to optimize the quality and overall consistency of the generated video based on motion complexity.

📖

용어

Diffusion-based Temporal Super-Resolution

Application of diffusion models to increase the frame rate (fps) of a video, generating coherent intermediate frames that realistically interpolate motion and changes between existing frames.

📖

용어

Video Inpainting by Diffusion

Process of filling missing or masked areas in a video sequence using a diffusion model, which generates pixels that are spatially and temporally coherent based on the context of surrounding frames.

📖

용어

Latent Motion Modeling

Technique where motion in a video is modeled and generated directly in the latent space, often by predicting displacements or transformations between latent codes of successive frames, before decoding them into images.

📖

용어

Temporal Consistency by Constraint

Approach that adds an explicit penalty or constraint in the model's loss function to discourage appearance changes (e.g., color, texture) unrelated to motion between adjacent frames, promoting visual stability.

📖

용어

Spatio-Temporal Noise Decomposition

Advanced method where the noise added and predicted by the model is decomposed into a spatial component (appearance) and a temporal component (motion), allowing finer control and more robust generation of dynamic videos.

📖

용어

Auto-Regression on Diffusion Latents

Hybrid strategy that generates a video auto-regressively frame by frame, where each latent frame is produced by a diffusion step conditioned on previous latent frames, combining the consistency of auto-regression and the quality of diffusion.

📖

용어

Temporal Feature Normalization

Normalization layer applied to the temporal dimension of feature maps in a 3D U-Net, stabilizing training by ensuring that the distribution of activations remains consistent across different temporal stages of the sequence.

AI 용어집