🏠 Startseite
Vergleiche
📊 Alle Benchmarks 🦖 Dinosaurier v1 🦖 Dinosaurier v2 ✅ To-Do-Listen-Apps 🎨 Kreative freie Seiten 🎯 FSACB - Ultimatives Showcase 🌍 Übersetzungs-Benchmark
Modelle
🏆 Top 10 Modelle 🆓 Kostenlose Modelle 📋 Alle Modelle ⚙️ Kilo Code
Ressourcen
💬 Prompt-Bibliothek 📖 KI-Glossar 🔗 Nützliche Links

KI-Glossar

Das vollständige Wörterbuch der Künstlichen Intelligenz

162
Kategorien
2.032
Unterkategorien
23.060
Begriffe
📖
Begriffe

Position-wise Feed-Forward Network

Neural network applied independently to each position in the sequence in the Transformer architecture, performing nonlinear transformations after the attention mechanism.

📖
Begriffe

GELU Activation

Gaussian Error Linear Unit activation function used in Transformer FFNs, combining dropout and ReLU properties for stochastic regularization.

📖
Begriffe

Two-layer MLP

Standard multilayer architecture of FFNs in Transformers consisting of two linear transformations with a nonlinear activation function between them.

📖
Begriffe

Hidden Dimension Expansion

Dimensionality increase in the first layer of the FFN (typically 4x the model dimension) before reduction in the second layer, allowing more expressive capacity.

📖
Begriffe

Feed-Forward Dimension

Intermediate dimension of the FFN in Transformers, typically four times larger than the model dimension to increase representation capacity.

📖
Begriffe

Position-independent Processing

Fundamental feature of FFNs applying the same weights to all positions, unlike the attention mechanism which is position-dependent.

📖
Begriffe

Swish Activation

Alternative activation function to GELU in FFNs, defined as x * sigmoid(βx), offering comparable performance with better differentiability.

📖
Begriffe

GLU Variants

Gated Linear Units and their variants (GeGLU, SwiGLU) used as alternatives to standard FFNs, introducing gating mechanisms for selective information flow control.

📖
Begriffe

Feed-Forward Sublayer

Individual component of the Transformer block containing the FFN, including residual connections and layer normalization to stabilize training.

📖
Begriffe

Linear Transformation Matrices

Weights W1 and W2 of the FFN transforming respectively to the expanded dimension and returning to the original model dimension.

📖
Begriffe

FFN Dropout

Regularization mechanism applied after activation in Transformer FFNs, randomly deactivating neurons to prevent overfitting.

📖
Begriffe

Inner Layer Normalization

Application of layer normalization before or after the FFN in Transformer architecture, with pre-norm and post-norm variants affecting training stability.

📖
Begriffe

Mixture of Experts FFN

Extension of standard FFNs using multiple FFN experts selectively activated by a routing network, allowing capacity increase without proportional computational increase.

📖
Begriffe

ReLU-based FFN

FFN variant using ReLU as activation function, simpler but less performant than GELU for most Transformer applications.

📖
Begriffe

Feed-Forward Projection

Linear projection operation in FFNs transforming representations between spaces of different dimensionalities to capture complex relationships.

📖
Begriffe

Adaptive FFN

Advanced FFN architecture dynamically adjusting its parameters based on input context, improving flexibility for specific tasks.

🔍

Keine Ergebnisse gefunden