Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
Scaling Laws
Mathematical principles describing how deep learning model performance improves predictably with increases in model size, data, and computation.
Power Law Scaling
Mathematical relationship where model performance follows a power law based on factors such as model size, number of parameters, or amount of data.
Chinchilla Scaling Laws
Specific scaling laws discovered by DeepMind suggesting that current models are undertrained and that data is more important than previously thought for optimizing performance.
Compute-Optimal Scaling
Strategy for optimally allocating computational resources between model size and training data quantity to maximize performance at a fixed budget.
Data Scaling Laws
Principles describing how increasing the amount of training data influences model performance, often following a power law relationship with saturation.
Model Size Scaling
Study of how model capabilities evolve based on the number of parameters, revealing predictable improvements up to certain saturation points.
Token Scaling
Analysis of the impact of the number of training tokens on model performance, essential for determining the optimal amount of textual data.
Emergent Abilities
Capabilities that suddenly appear in large models at certain critical scales, which are not present in smaller models of the same family.
Phase Transitions
Abrupt changes in model behavior or performance that occur at specific size or data thresholds.
Neural Scaling Laws
General theoretical framework unifying empirical observations on neural network scaling across different architectures and tasks.
Kaplan Scaling Laws
First empirical scaling laws established by OpenHub, showing power relationships between model size, data, and performance.
IsoFLOP Curves
Performance curves at constant FLOP budget allowing comparison of different architectures or training strategies at equal computational cost.
Critical Batch Size
Optimal batch size beyond which further increase no longer produces significant improvements in training speed.
Double Descent
Phenomenon where test error decreases, increases, and then decreases again as model size exceeds the data interpolation point.
Grokking
Phenomenon where models suddenly acquire generalizable understanding after a long period of apparent overfitting.
Sharpness-Aware Minimization
Optimization technique seeking flat minima in the loss landscape, particularly important for the stability of large models.
Loss Scaling
Prediction of the evolution of the loss function based on allocated resources, allowing performance estimation before training.
Performance Plateaus
Phases of stagnation in performance improvement despite increasing resources, indicating limits in current scaling laws.
Scaling Exponent
Crucial parameter in power laws determining the rate of performance improvement relative to resource increase.
Scaling Coefficient
Multiplicative constant in scaling equations determining the baseline performance level before applying scaling effects.