Scaling Laws - Glosarium AI

📖

istilah

Scaling Law

Mathematical principle establishing a predictive relationship between the performance of a language model and three key factors: model size (number of parameters), volume of training data, and computational power used.

📖

istilah

Chinchilla Law

Specific empirical rule from DeepMind's experiments, stating that for an optimal compute budget, model size and volume of training data should be scaled isomorphically, contrary to previous assumptions.

📖

istilah

Computational Power (Compute)

Computational resource measured in FLOPS (Floating Point Operations Per Second), which constitutes the third pillar of scaling laws and determines the duration and feasibility of training large language models.

📖

istilah

Isomorphic Scaling

Scaling strategy where model size (N) and data volume (D) increase proportionally according to the relationship N ≈ D, thus optimizing performance for a given compute budget.

📖

istilah

Test Loss

Performance metric, often cross-validation loss (cross-entropy loss), used as a dependent variable in scaling laws to quantify a model's effectiveness on unseen data.

📖

istilah

Scaling Exponent

Coefficient in the power law equation (e.g., L(N) ∝ N^(-α)) that determines the rate of decrease in test loss based on the increase of a variable such as model size or data.

📖

istilah

Scaling Transfer

Phenomenon where scaling laws observed on smaller models and more limited datasets can be extrapolated to accurately predict the performance of much larger models.

📖

istilah

Compute Budget Optimization

Process of allocating resources between model size, data, and training time to maximize final performance under a total compute budget constraint, guided by scaling laws.

📖

istilah

Sub-Optimal Scaling Regime

A situation where a model is trained with an imbalance between its size and the data volume, for example a large model on little data, leading to performance lower than that predicted by optimal scaling laws.

📖

istilah

Power Law

A mathematical relationship of the form Y = aX^b that underpins AI scaling laws, describing how a performance metric (Y) systematically varies with an input resource (X) such as the number of parameters.

📖

istilah

Number of Parameters (Model Size)

A fundamental variable in scaling laws, representing the total number of trainable weights in a neural network, which is directly correlated with the model's capacity to memorize and generalize.

📖

istilah

Training Data Volume (Dataset Size)

The quantity of unique tokens or words used to train a model, the increase of which is essential to avoid overfitting and to realize the full performance potential predicted by scaling laws.

📖

istilah

Predictive Performance

A model's ability to make accurate predictions on new data, quantified by test loss, and which is the target variable that scaling laws seek to optimize.

📖

istilah

Kaplan's Hypothesis

A scaling theory preceding the Chinchilla law, which postulated that performance improved most effectively by increasing model size while keeping the number of training tokens relatively constant.

📖

istilah

Pareto Frontier in Scaling

The set of optimal resource allocations (model size, data, compute) for which it is impossible to improve performance in one dimension without degrading performance in another, illustrating the trade-offs in scaling.

📖

istilah

Loss Convergence

The tendency of test loss to decrease and stabilize as resources (model, data, compute) are increased, following a predictable trajectory defined by scaling laws.

📖

istilah

Data Scaling

Axis of the Chinchilla law that examines how increasing the volume and diversity of training data impacts model performance, regardless of its size.

📖

istilah

Model Scaling

Process of increasing the number of parameters in a language model, which, according to scaling laws, must be accompanied by a proportional increase in data to achieve optimal performance.

Glosarium AI

Scaling Law

Chinchilla Law

Computational Power (Compute)

Isomorphic Scaling

Test Loss

Scaling Exponent

Scaling Transfer

Compute Budget Optimization

Sub-Optimal Scaling Regime

Power Law

Number of Parameters (Model Size)

Training Data Volume (Dataset Size)

Predictive Performance

Kaplan's Hypothesis

Pareto Frontier in Scaling

Loss Convergence

Data Scaling

Model Scaling

Tidak ada hasil ditemukan