KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Transformer for Time Series
Deep neural network architecture, initially designed for NLP, adapted to model complex and long-term dependencies in temporal sequential data through its attention mechanisms.
Self-Attention
Process where each element in a time sequence interacts with all other elements in the same sequence to compute a contextual representation, essential for understanding internal dependencies.
PatchTST
Transformer model that segments the time series into subsequences (patches) before processing them, reducing computational complexity and improving the ability to model local and global dependencies.
Informer
Enhanced Transformer architecture introducing a sparse attention mechanism and distillation to effectively reduce complexity and mitigate the problem of forecast degradation over long horizons.
Sparse Attention
Variant of the attention mechanism where each token only focuses on a selected subset of other tokens, drastically reducing computational cost from O(n²) to O(n log n) or O(n).
Long-Horizon Forecasting
Prediction task involving estimating time series values over an extended time horizon, a major challenge where Transformers excel due to their handling of long-term dependencies.
Long-Term Temporal Dependency
Statistical relationship between an observation and values far in the past, which traditional models like RNNs struggle to capture but which Transformers model effectively.
Multi-Head Attention
Layer composed of multiple attention heads in parallel, concatenating their outputs to allow the model to focus on different positions and extract richer features.
Bottleneck Transformer
Architecture variant that compresses the input sequence into a lower-dimensional latent space before applying attention mechanisms, to efficiently handle very long time series.
Time Series Tokenization
Process of discretizing or segmenting a continuous time series into a sequence of discrete 'tokens', which serve as input to the Transformer's processing layers.
Wash-out Effect
Phenomenon where relevant information from old time steps is lost or 'washed out' during propagation through multiple layers of a model, a problem that attention mechanisms aim to solve.
Quadratic Complexity
Computational cost of O(n²) for standard attention, where n is the sequence length, which constitutes the main limitation of Transformers for very long time series.
Contextual Representation
Embedding vector for a given time step that is computed based on all other time steps in the sequence, thus capturing its meaning and importance in the global context.
Encoder-Decoder Layers
Transformer structure where the encoder processes the input sequence (history) to create a representation, and the decoder uses this representation to generate the output sequence (forecasts) step by step.