🏠 Home
Benchmark Hub
📊 All Benchmarks 🦖 Dinosaur v1 🦖 Dinosaur v2 ✅ To-Do List Applications 🎨 Creative Free Pages 🎯 FSACB - Ultimate Showcase 🌍 Translation Benchmark
Models
🏆 Top 10 Models 🆓 Free Models 📋 All Models ⚙️ Kilo Code
Resources
💬 Prompts Library 📖 AI Glossary 🔗 Useful Links

AI Glossary

The complete dictionary of Artificial Intelligence

162
categories
2,032
subcategories
23,060
terms
📖
terms

Model-Based Reinforcement Learning

Reinforcement learning approach where the agent builds an internal model of the environment to simulate transitions and generate experiences without real interaction.

📖
terms

Dyna-Q

Hybrid reinforcement learning algorithm combining direct learning from real experience and planning using a learned model to generate additional simulated experiences.

📖
terms

Direct learning

Process of updating action values or policy based solely on real experiences accumulated during interaction with the environment.

📖
terms

Planning in reinforcement learning

Using an environmental model to generate synthetic experiences and improve the policy without additional interactions with the real environment.

📖
terms

Transition model

Component of the predictive environment model that estimates the probability distribution of next states given a current state and an action.

📖
terms

Reward model

Learned function that predicts the expected reward for each state-action pair in a reinforcement learning environment.

📖
terms

Simulated experiences

Artificially generated samples by the internal environment model to accelerate learning without requiring additional real interactions.

📖
terms

Value update

Iterative process of adjusting action-value estimates Q(s,a) based on observed rewards and the values of future states according to Bellman's equation.

📖
terms

Experience replay buffer

Data structure storing tuples (state, action, reward, next_state) to allow repeated updates during the planning phase.

📖
terms

Dyna-Q+

Extension of Dyna-Q incorporating an exploration mechanism based on the time elapsed since the last state-action pair visit to detect and adapt to environmental changes.

📖
terms

Prioritized sweeping

Variant of Dyna-Q where updates are prioritized based on their potential impact on values, optimizing the computational efficiency of the planning phase.

📖
terms

Planning effect

Acceleration of learning observed when the number of planning steps per real step increases, up to a point of diminishing returns.

📖
terms

Algorithm convergence

Property guaranteeing that Dyna-Q's value estimates converge to the optimal values under certain conditions of an exact model and infinite visits.

📖
terms

Model error

Discrepancy between the actual behavior of the environment and the predictions of the learned model, which can degrade performance if not managed.

📖
terms

Computational complexity

Computational cost of Dyna-Q, depending linearly on the size of the experience replay buffer and the number of planning updates per iteration.

📖
terms

Model generalization

Ability to extrapolate the model's predictions to unseen state-actions, often implemented using neural networks or other function approximators.

📖
terms

State space sampling

Strategy for selecting simulated experiences from memory during the planning phase, influencing the learning efficiency of Dyna-Q.

📖
terms

Planning function

Algorithmic component that performs repeated updates on stored experiences to refine value estimates without new environmental interaction.

📖
terms

Adaptive learning rate

Mechanism for dynamically adjusting the learning rate in Dyna-Q to optimize convergence considering the variance of real and simulated experiences.

🔍

No results found