Słownik AI
Kompletny słownik sztucznej inteligencji
Chinchilla Scaling Law
Empirical principle established by DeepMind indicating that for optimal computational budget, model size and training data volume should be scaled isometrically, with a data/parameters ratio of approximately 20:1.
Power Law
Mathematical relationship of the form L(N, D, C) = A * N^α * D^β * C^γ, where loss L decreases predictably based on the number of parameters N, dataset size D, and computational budget C.
Scaling Transfer
Phenomenon where scaling laws observed on smaller models can accurately predict the performance of much larger models, even before their complete training.
Optimal Computational Budget
Resource allocation (FLOPs) that maximizes model performance for a given computational cost, by judiciously balancing model size and training data quantity.
Data Saturation
Point beyond which increasing training data volume no longer provides significant improvement to model performance for a given model size, indicating model underfitting.
Scaling Exponent
Coefficient (α, β, γ) in the power law that quantifies how efficiently performance improves when increasing the number of parameters, data size, or computational budget respectively.
Compute-Bound Regime
Training phase where performance is primarily limited by the available computational resources, making increasing model size more effective than increasing data.
Data-Bound Regime
Training phase where performance is primarily limited by the quantity and quality of available data, making increasing data volume more effective than increasing model size.
Predicted Test Loss
Value of the loss on a test dataset, estimated in advance using scaling laws based on model size, data size, and computational budget.
Critical Scaling
Model size threshold from which performance gains follow a steeper scaling law, often observed in very large language models.
Emergence via Scaling
Appearance of new capabilities (reasoning, understanding) that did not exist in smaller models and emerge spontaneously when model size exceeds a certain critical threshold.
Scaling Efficiency
Measure of performance obtained per unit of resource (parameter, data, or FLOP), allowing comparison of different allocation strategies for a given budget.
Chinchilla Isomorphism Hypothesis
Postulate that for a fixed computational budget, model parameter count and training tokens must be increased proportionally to achieve optimal performance.
Kaplan's Law
Set of initial scaling laws proposed by OpenAI that suggested performance was primarily a function of model size, with less importance given to data volume.
Pareto Frontier in Scaling
Set of optimal resource allocations (model size vs. data) where it is impossible to improve one factor without degrading the other, defining efficient trade-offs in scaling.
Scaling Performance Metric
Quantitative indicator (validation loss, perplexity, benchmark score) used to measure model effectiveness and track its improvement based on scaling different resources.
Predictability of Scaling
Ability of scaling laws to accurately anticipate the performance of models not yet trained, based on extrapolation of trends observed on smaller models.
Multi-Objective Optimization in Scaling
Process aimed at finding the best compromise between multiple conflicting objectives (performance, cost, latency) when determining the optimal model and data size.