🏠 Trang chủ
Benchmark
📊 Tất cả benchmark 🦖 Khủng long v1 🦖 Khủng long v2 ✅ Ứng dụng To-Do List 🎨 Trang tự do sáng tạo 🎯 FSACB - Trình diễn cuối cùng 🌍 Benchmark dịch thuật
Mô hình
🏆 Top 10 mô hình 🆓 Mô hình miễn phí 📋 Tất cả mô hình ⚙️ Kilo Code
Tài nguyên
💬 Thư viện prompt 📖 Thuật ngữ AI 🔗 Liên kết hữu ích

Thuật ngữ AI

Từ điển đầy đủ về Trí tuệ nhân tạo

162
danh mục
2.032
danh mục con
23.060
thuật ngữ
📖
thuật ngữ

Multi-Armed Bandit

Fundamental reinforcement learning problem where an agent must sequentially select among multiple options (arms) to maximize the sum of obtained rewards.

📖
thuật ngữ

Exploration-Exploitation Dilemma

Central conflict between exploring new options to discover their potential rewards and exploiting options known to be the most profitable.

📖
thuật ngữ

Regret Rate

Performance measure quantifying the cumulative difference between obtained rewards and optimal ones, evaluating the effectiveness of the learning strategy.

📖
thuật ngữ

UCB Algorithm

Optimistic strategy that selects the arm with the highest upper confidence bound, balancing exploration and exploitation through statistical confidence intervals.

📖
thuật ngữ

ε-greedy Algorithm

Simple policy choosing the optimal arm with probability (1-ε) and exploring randomly with probability ε, controlling the exploration-exploitation trade-off.

📖
thuật ngữ

Stochastic Reward

Random return following an unknown probability distribution associated with each arm, modeling the inherent uncertainty in real environments.

📖
thuật ngữ

Action Policy

Rule or algorithm determining the choice of arm at each step based on accumulated information, defining the agent's behavior.

📖
thuật ngữ

Bernoulli Distribution

Binary reward model (success/failure) frequently used in bandit problems, characterized by a single success probability parameter.

📖
thuật ngữ

Bayesian Update

Iterative process of updating beliefs about reward distribution parameters by combining prior information and new observations.

📖
thuật ngữ

Non-Stationary Bandit

Variant where reward distributions change over time, requiring adaptive strategies capable of tracking these variations.

📖
thuật ngữ

Optimism in the Face of Uncertainty

Algorithmic principle favoring arms with high uncertainty and high reward potential, ensuring efficient exploration.

📖
thuật ngữ

Convergence Rate

Speed at which the algorithm approaches the optimal policy, measuring the asymptotic efficiency of the learning strategy.

📖
thuật ngữ

Adversarial Bandit

Scenario where rewards are chosen by an adversary rather than following stochastic distributions, requiring robust strategies.

📖
thuật ngữ

Optimistic Initialization

Technique initializing reward estimates to high values to encourage early exploration of all available arms.

📖
thuật ngữ

Linear Bandit

Generalization where the expected reward is a linear function of contextual features, allowing for more complex structures.

📖
thuật ngữ

Variance Reduction

Technique aimed at decreasing the uncertainty of reward estimates to accelerate convergence to the optimal policy.

🔍

Không tìm thấy kết quả