AI Glossary
The complete dictionary of Artificial Intelligence
AutoInt
Deep neural network architecture designed to automatically model high-order feature interactions in tabular data, using a multi-head attention mechanism.
Multi-Head Attention Mechanism
Module that allows the model to simultaneously focus on different positions in the input sequence, learning multiple attention representations in parallel to capture complex dependencies.
High-Order Feature Interaction
Non-linear combination of three or more input variables, whose capture is essential to improve the predictive power of models on complex structured data.
Feature Embedding
Dense and low-dimensional vector representation of categorical features, allowing the model to treat these variables as continuous inputs and learn their semantic relationships.
Interaction Network
Subnetwork within AutoInt responsible for explicitly calculating interactions between feature embedding vectors before applying the attention mechanism.
Attention Value
Weighted score calculated by the attention mechanism that quantifies the importance of a specific feature or interaction for the model's final prediction.
Attention Pooling
Aggregation operation that uses attention weights to combine feature representations, producing a context vector that highlights the most relevant information.
Automatic Interaction Learning
Paradigm where the model discovers and hierarchizes relevant feature interactions itself without requiring manual engineering or a priori specification.
Query Vector
In the attention mechanism, a vector that represents the current state of the model and is used to calculate the compatibility score with each key vector.
Key Vector
Representation of a feature or candidate interaction, used to be compared with the query vector to determine its attention level.
Value Vector
Vector containing the actual information of a feature, which is weighted by the attention score and aggregated to form the output of the attention mechanism.
Scaled Dot Product
Similarity function used in attention to calculate scores, where the dot product is divided by the square root of the vector dimension to stabilize training.
Residual and Layer Normalization
Architecture technique where the output of a layer is added to its input (residual connection) and then normalized, facilitating the training of deep networks like AutoInt.
Cross Interaction
Specific operation in AutoInt that calculates interactions between feature pairs using element-wise multiplication followed by linear transformation.
Attention Head
One of the multiple attention mechanisms working in parallel in a multi-head module, each learning to focus on different aspects of feature interactions.
Head Aggregation
Process of concatenating or averaging the outputs of all attention heads to form a unified representation before passing it to the next layer.
Long-Range Dependency Modeling
Ability of attention mechanism to capture relationships between distant features in embedding space, surpassing the limitations of local models like CNNs.
Attention Map Interpretability
Method to visualize and understand model decisions by analyzing attention weights, revealing which feature interactions were most influential.