🏠 Beranda
Benchmark
📊 Semua Benchmark 🦖 Dinosaurus v1 🦖 Dinosaurus v2 ✅ Aplikasi To-Do List 🎨 Halaman Bebas Kreatif 🎯 FSACB - Showcase Utama 🌍 Benchmark Terjemahan
Model
🏆 Top 10 Model 🆓 Model Gratis 📋 Semua Model ⚙️ Kilo Code
Sumber Daya
💬 Perpustakaan Prompt 📖 Glosarium AI 🔗 Tautan Berguna

Glosarium AI

Kamus lengkap Kecerdasan Buatan

162
kategori
2.032
subkategori
23.060
istilah
📖
istilah

Model-Based Reinforcement Learning

Reinforcement learning approach where the agent builds an internal model of the environment to simulate transitions and generate experiences without real interaction.

📖
istilah

Dyna-Q

Hybrid reinforcement learning algorithm combining direct learning from real experience and planning using a learned model to generate additional simulated experiences.

📖
istilah

Direct learning

Process of updating action values or policy based solely on real experiences accumulated during interaction with the environment.

📖
istilah

Planning in reinforcement learning

Using an environmental model to generate synthetic experiences and improve the policy without additional interactions with the real environment.

📖
istilah

Transition model

Component of the predictive environment model that estimates the probability distribution of next states given a current state and an action.

📖
istilah

Reward model

Learned function that predicts the expected reward for each state-action pair in a reinforcement learning environment.

📖
istilah

Simulated experiences

Artificially generated samples by the internal environment model to accelerate learning without requiring additional real interactions.

📖
istilah

Value update

Iterative process of adjusting action-value estimates Q(s,a) based on observed rewards and the values of future states according to Bellman's equation.

📖
istilah

Experience replay buffer

Data structure storing tuples (state, action, reward, next_state) to allow repeated updates during the planning phase.

📖
istilah

Dyna-Q+

Extension of Dyna-Q incorporating an exploration mechanism based on the time elapsed since the last state-action pair visit to detect and adapt to environmental changes.

📖
istilah

Prioritized sweeping

Variant of Dyna-Q where updates are prioritized based on their potential impact on values, optimizing the computational efficiency of the planning phase.

📖
istilah

Planning effect

Acceleration of learning observed when the number of planning steps per real step increases, up to a point of diminishing returns.

📖
istilah

Algorithm convergence

Property guaranteeing that Dyna-Q's value estimates converge to the optimal values under certain conditions of an exact model and infinite visits.

📖
istilah

Model error

Discrepancy between the actual behavior of the environment and the predictions of the learned model, which can degrade performance if not managed.

📖
istilah

Computational complexity

Computational cost of Dyna-Q, depending linearly on the size of the experience replay buffer and the number of planning updates per iteration.

📖
istilah

Model generalization

Ability to extrapolate the model's predictions to unseen state-actions, often implemented using neural networks or other function approximators.

📖
istilah

State space sampling

Strategy for selecting simulated experiences from memory during the planning phase, influencing the learning efficiency of Dyna-Q.

📖
istilah

Planning function

Algorithmic component that performs repeated updates on stored experiences to refine value estimates without new environmental interaction.

📖
istilah

Adaptive learning rate

Mechanism for dynamically adjusting the learning rate in Dyna-Q to optimize convergence considering the variance of real and simulated experiences.

🔍

Tidak ada hasil ditemukan