KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
CNN 3D
Convolutional neural network architecture applying three-dimensional convolutions on volumetric or video data to simultaneously capture spatial and temporal features.
Convolution 3D
Mathematical operation applying a three-dimensional filter on a 3D input tensor to extract spatio-temporal features by traversing the height, width, and depth/temporal dimensions.
Pooling 3D
Dimensionality reduction technique applied to 3D volumes that performs spatial and temporal downsampling to reduce computational complexity while preserving essential information.
C3D (Convolutional 3D Network)
Pioneering 3D CNN architecture using uniform 3x3x3 convolutions throughout the entire network, demonstrating the effectiveness of three-dimensional convolutions for video analysis.
I3D (Inflated 3D ConvNet)
Innovative method 'inflating' 2D filters pre-trained on ImageNet into 3D filters, allowing for effective transfer of knowledge from 2D images to 3D video.
ResNet3D
3D extension of the Residual Network architecture incorporating residual connections in three-dimensional convolutions to facilitate the training of very deep networks on volumetric data.
Attention Spatio-temporelle
Attention mechanism that dynamically weights the importance of spatial regions and temporal instants in a video sequence to improve the recognition of complex actions.
Feature Maps Volumétriques
Output 3D tensors after 3D convolution operations, representing learned features at different spatial positions and temporal moments of the input sequence.
3D Kernels
Three-dimensional convolutional filters of size (d, h, w) sliding through the input volume to detect local spatio-temporal patterns in video or volumetric data.
Temporal Pooling
Temporal aggregation operation combining features from multiple consecutive frames to create a compact representation of the sequence while preserving dynamic information.
Video Classification
Task of automatically classifying entire videos into predefined categories using 3D CNN architectures to analyze global spatio-temporal content.
Action Recognition
Application of 3D CNNs consisting of identifying and classifying human actions in video sequences by capturing spatio-temporal movements and interactions.
3D Medical Imaging
Application area of 3D CNNs for analyzing volumetric medical images (CT, MRI) enabling tumor detection, organ segmentation, and computer-aided diagnosis.
Optical Flow
Vector field representing the apparent motion between consecutive frames, often integrated as an additional input channel in 3D CNN architectures to improve motion understanding.
Two-Stream Networks
Architecture combining a spatial stream (RGB frames) and a temporal stream (optical flow) fused at a later stage to capture both appearance and motion in video analysis.
Spatio-temporal Sampling
Sampling strategy of contiguous and non-overlapping video segments during training, enabling efficient coverage of the temporal dimension with controlled complexity.
Volumetric Data
Three-dimensional structured data (x, y, z) representing complete spatial information such as medical scanners, 3D models or temporal video cubes.
Multi-view CNN
Approach that simultaneously processes multiple perspectives or views of a 3D object or video scene using 3D convolutions to capture complex geometric relationships.
Deep 3D CNN
3D CNN architectures with many stacked convolutional layers (typically >50) capable of learning very complex spatio-temporal feature hierarchies for advanced tasks.
Temporal Modeling
Ability of 3D CNNs to capture and model temporal dependencies and feature evolution over time, essential for understanding the dynamics of video sequences.