Video and Temporal Diffusion

📖

Begriffe

Video Diffusion Model

Generation architecture that applies the diffusion process to spatio-temporal data, gradually adding noise to images in a video sequence before learning to denoise it to reconstruct or create coherent videos.

📖

Begriffe

Spatio-Temporal Latent Diffusion

Variant of video diffusion models that operates in a compressed latent space, reducing computational complexity by applying the noise addition and denoising process on low-dimensional representations rather than on raw pixels of each frame.

📖

Begriffe

3D Attention

Attention mechanism that simultaneously processes spatial (height, width) and temporal (time) dimensions of a video, allowing the model to weight the importance of different regions across different moments to capture spatio-temporal dependencies.

📖

Begriffe

Time Embedding

Technique for encoding temporal information (diffusion step, position in the sequence) as vectors that are injected into the network, guiding the generation process to maintain consistency and movement over time.

📖

Begriffe

Conditional Denoising by Previous Frame

Strategy where the noise prediction for a video frame is conditioned by the denoised version of the previous frame, ensuring strong continuity and temporal consistency between successive images in the generated sequence.

📖

Begriffe

3D U-Net Architecture

Convolutional neural network structure adapted for video data, combining encoder-decoder paths with 3D residual connections to effectively capture multi-scale spatial and temporal contexts during denoising.

📖

Begriffe

Spatio-Temporal Latent Space

Compressed and abstract representation of a video sequence, where spatial and temporal information is encoded in a low-dimensional vector or feature map, serving as the basis for video generation or manipulation.

📖

Begriffe

Video Classifier-Free Guidance (CFG)

Method for controlling video generation without an explicit classifier, by training a model on both conditional (e.g., text) and unconditional data, then interpolating between their predictions to adjust adherence to the prompt while preserving diversity.

📖

Begriffe

Temporal Diffusion Scheduling

Planning the number of denoising steps allocated to each frame or temporal segments, which can be uniform or adaptive to optimize the quality and overall consistency of the generated video based on motion complexity.

📖

Begriffe

Diffusion-based Temporal Super-Resolution

Application of diffusion models to increase the frame rate (fps) of a video, generating coherent intermediate frames that realistically interpolate motion and changes between existing frames.

📖

Begriffe

Video Inpainting by Diffusion

Process of filling missing or masked areas in a video sequence using a diffusion model, which generates pixels that are spatially and temporally coherent based on the context of surrounding frames.

📖

Begriffe

Latent Motion Modeling

Technique where motion in a video is modeled and generated directly in the latent space, often by predicting displacements or transformations between latent codes of successive frames, before decoding them into images.

📖

Begriffe

Temporal Consistency by Constraint

Approach that adds an explicit penalty or constraint in the model's loss function to discourage appearance changes (e.g., color, texture) unrelated to motion between adjacent frames, promoting visual stability.

📖

Begriffe

Spatio-Temporal Noise Decomposition

Advanced method where the noise added and predicted by the model is decomposed into a spatial component (appearance) and a temporal component (motion), allowing finer control and more robust generation of dynamic videos.

📖

Begriffe

Auto-Regression on Diffusion Latents

Hybrid strategy that generates a video auto-regressively frame by frame, where each latent frame is produced by a diffusion step conditioned on previous latent frames, combining the consistency of auto-regression and the quality of diffusion.

📖

Begriffe

Temporal Feature Normalization

Normalization layer applied to the temporal dimension of feature maps in a 3D U-Net, stabilizing training by ensuring that the distribution of activations remains consistent across different temporal stages of the sequence.

KI-Glossar

Video Diffusion Model

Spatio-Temporal Latent Diffusion

3D Attention

Time Embedding

Conditional Denoising by Previous Frame

3D U-Net Architecture

Spatio-Temporal Latent Space

Video Classifier-Free Guidance (CFG)

Temporal Diffusion Scheduling

Diffusion-based Temporal Super-Resolution

Video Inpainting by Diffusion

Latent Motion Modeling

Temporal Consistency by Constraint

Spatio-Temporal Noise Decomposition

Auto-Regression on Diffusion Latents

Temporal Feature Normalization

Keine Ergebnisse gefunden