🏠 Beranda
Benchmark
📊 Semua Benchmark 🦖 Dinosaurus v1 🦖 Dinosaurus v2 ✅ Aplikasi To-Do List 🎨 Halaman Bebas Kreatif 🎯 FSACB - Showcase Utama 🌍 Benchmark Terjemahan
Model
🏆 Top 10 Model 🆓 Model Gratis 📋 Semua Model ⚙️ Kilo Code
Sumber Daya
💬 Perpustakaan Prompt 📖 Glosarium AI 🔗 Tautan Berguna

Glosarium AI

Kamus lengkap Kecerdasan Buatan

162
kategori
2.032
subkategori
23.060
istilah
📖
istilah

Nesterov Momentum

Variant of the momentum algorithm that applies a lookahead correction by calculating the gradient at the estimated future position, accelerating convergence and reducing oscillations.

📖
istilah

Adam (Adaptive Moment Estimation)

Optimization algorithm combining the ideas of Momentum and RMSprop, using estimates of the first and second moments of gradients to adapt the learning rates of each parameter.

📖
istilah

AdaGrad

Adaptive optimizer that adjusts the learning rate of each parameter based on the historical sum of squared gradients, favoring infrequent parameters.

📖
istilah

AdaDelta

Extension of AdaGrad that limits the accumulation window of past gradients to a fixed size via a moving average, avoiding the aggressive decay of the learning rate.

📖
istilah

Learning Rate Decay

Strategy for progressively reducing the learning rate during training, often according to a predefined schedule (step, exponential, or cosine), to fine-tune convergence towards a minimum.

📖
istilah

LAMB Optimizer (Layer-wise Adaptive Moments)

Optimization algorithm designed for large-scale training, adapting the learning rate per layer using the norm of weights and gradients, effective for very large batch sizes.

📖
istilah

LARS Optimizer (Layer-wise Adaptive Rate Scaling)

Optimization method that adapts the learning rate for each layer based on the ratio between the norm of weights and the norm of gradients, particularly suitable for training with large batches.

📖
istilah

Lookahead Optimizer

Optimization mechanism that periodically updates the 'slow' weights towards the average of 'fast' weights generated by an internal optimizer, improving generalization and convergence stability.

📖
istilah

RAdam (Rectified Adam)

A variant of Adam that corrects the variance of the learning rate adaptation in the early stages of training, offering more stable convergence without requiring a warmup phase.

📖
istilah

SWATS (Switching from Adam to SGD)

A strategy that starts training with an adaptive optimizer like Adam for fast convergence, then switches to Stochastic Gradient Descent (SGD) for better generalization.

📖
istilah

Yogi Optimizer

A modification of Adam aimed at providing more stable convergence by using a less aggressive second-moment update, reducing oscillations and improving performance on complex tasks.

📖
istilah

Shampoo

A second-order optimizer that preconditions gradients using blockwise approximations of the Hessian matrix, accelerating convergence for ill-conditioned problems.

📖
istilah

Learning Rate Restart

A cyclical technique where the learning rate is periodically reset to its initial value, allowing the model to escape local minima and explore new regions of the solution space.

🔍

Tidak ada hasil ditemukan