🏠 Beranda
Benchmark
📊 Semua Benchmark 🦖 Dinosaurus v1 🦖 Dinosaurus v2 ✅ Aplikasi To-Do List 🎨 Halaman Bebas Kreatif 🎯 FSACB - Showcase Utama 🌍 Benchmark Terjemahan
Model
🏆 Top 10 Model 🆓 Model Gratis 📋 Semua Model ⚙️ Kilo Code
Sumber Daya
💬 Perpustakaan Prompt 📖 Glosarium AI 🔗 Tautan Berguna

Glosarium AI

Kamus lengkap Kecerdasan Buatan

162
kategori
2.032
subkategori
23.060
istilah
📖
istilah

Epsilon exploration rate

Control parameter in the epsilon-greedy algorithm determining the probability of exploration rather than exploitation. Its value directly influences the convergence speed and final quality of the learned policy.

📖
istilah

Greedy action

Action selected with the highest estimated value according to the agent's current knowledge. In epsilon-greedy, this action is chosen with probability 1-ε during the exploitation phase.

📖
istilah

Random exploration

Process consisting of selecting an action uniformly at random from all available actions. In epsilon-greedy, this strategy is applied with probability ε to discover new potentially rewarding options.

📖
istilah

Epsilon decay

Technique where the epsilon value gradually decreases over time to favor initial exploration and final exploitation. This approach enables more stable convergence toward an optimal policy.

📖
istilah

Optimistic epsilon-greedy

Variant of the algorithm initializing action values with high optimistic estimates to encourage initial exploration. This method forces the agent to test all actions at least once.

📖
istilah

Cumulative regret

Performance measure quantifying the difference between the sum of possible optimal rewards and those actually obtained by the algorithm. It serves as an indicator to evaluate the efficiency of the learning policy.

📖
istilah

Algorithm convergence

Property guaranteeing that the epsilon-greedy algorithm converges to the optimal policy under certain conditions. Convergence depends on appropriate epsilon decay and a sufficient number of iterations.

📖
istilah

Value initialization

Process of assigning initial values to reward estimates for each action at the beginning of learning. The initialization strategy significantly influences the agent's initial exploratory behavior.

📖
istilah

Pure greedy policy

Strategy where epsilon = 0, resulting in systematic exploitation of the currently deemed optimal action without any exploration. This policy may prematurely converge to a local optimum.

📖
istilah

Epsilon annealing

Technique for gradual and controlled reduction of the epsilon value during learning. Annealing enables a smooth transition from exploration to exploitation to improve convergence.

🔍

Tidak ada hasil ditemukan