🏠 Ana Sayfa
Benchmarklar
📊 Tüm Benchmarklar 🦖 Dinozor v1 🦖 Dinozor v2 ✅ To-Do List Uygulamaları 🎨 Yaratıcı Serbest Sayfalar 🎯 FSACB - Nihai Gösteri 🌍 Çeviri Benchmarkı
Modeller
🏆 En İyi 10 Model 🆓 Ücretsiz Modeller 📋 Tüm Modeller ⚙️ Kilo Code
Kaynaklar
💬 Prompt Kütüphanesi 📖 YZ Sözlüğü 🔗 Faydalı Bağlantılar

YZ Sözlüğü

Yapay Zekanın tam sözlüğü

162
kategoriler
2.032
alt kategoriler
23.060
terimler
📖
terimler

Model-Based Reinforcement Learning

Reinforcement learning approach where the agent builds an internal model of the environment to simulate transitions and generate experiences without real interaction.

📖
terimler

Dyna-Q

Hybrid reinforcement learning algorithm combining direct learning from real experience and planning using a learned model to generate additional simulated experiences.

📖
terimler

Direct learning

Process of updating action values or policy based solely on real experiences accumulated during interaction with the environment.

📖
terimler

Planning in reinforcement learning

Using an environmental model to generate synthetic experiences and improve the policy without additional interactions with the real environment.

📖
terimler

Transition model

Component of the predictive environment model that estimates the probability distribution of next states given a current state and an action.

📖
terimler

Reward model

Learned function that predicts the expected reward for each state-action pair in a reinforcement learning environment.

📖
terimler

Simulated experiences

Artificially generated samples by the internal environment model to accelerate learning without requiring additional real interactions.

📖
terimler

Value update

Iterative process of adjusting action-value estimates Q(s,a) based on observed rewards and the values of future states according to Bellman's equation.

📖
terimler

Experience replay buffer

Data structure storing tuples (state, action, reward, next_state) to allow repeated updates during the planning phase.

📖
terimler

Dyna-Q+

Extension of Dyna-Q incorporating an exploration mechanism based on the time elapsed since the last state-action pair visit to detect and adapt to environmental changes.

📖
terimler

Prioritized sweeping

Variant of Dyna-Q where updates are prioritized based on their potential impact on values, optimizing the computational efficiency of the planning phase.

📖
terimler

Planning effect

Acceleration of learning observed when the number of planning steps per real step increases, up to a point of diminishing returns.

📖
terimler

Algorithm convergence

Property guaranteeing that Dyna-Q's value estimates converge to the optimal values under certain conditions of an exact model and infinite visits.

📖
terimler

Model error

Discrepancy between the actual behavior of the environment and the predictions of the learned model, which can degrade performance if not managed.

📖
terimler

Computational complexity

Computational cost of Dyna-Q, depending linearly on the size of the experience replay buffer and the number of planning updates per iteration.

📖
terimler

Model generalization

Ability to extrapolate the model's predictions to unseen state-actions, often implemented using neural networks or other function approximators.

📖
terimler

State space sampling

Strategy for selecting simulated experiences from memory during the planning phase, influencing the learning efficiency of Dyna-Q.

📖
terimler

Planning function

Algorithmic component that performs repeated updates on stored experiences to refine value estimates without new environmental interaction.

📖
terimler

Adaptive learning rate

Mechanism for dynamically adjusting the learning rate in Dyna-Q to optimize convergence considering the variance of real and simulated experiences.

🔍

Sonuç bulunamadı