Dyna-Q Learning - AI-ordlista

📖

termer

Model-Based Reinforcement Learning

Reinforcement learning approach where the agent builds an internal model of the environment to simulate transitions and generate experiences without real interaction.

📖

termer

Dyna-Q

Hybrid reinforcement learning algorithm combining direct learning from real experience and planning using a learned model to generate additional simulated experiences.

📖

termer

Direct learning

Process of updating action values or policy based solely on real experiences accumulated during interaction with the environment.

📖

termer

Planning in reinforcement learning

Using an environmental model to generate synthetic experiences and improve the policy without additional interactions with the real environment.

📖

termer

Transition model

Component of the predictive environment model that estimates the probability distribution of next states given a current state and an action.

📖

termer

Reward model

Learned function that predicts the expected reward for each state-action pair in a reinforcement learning environment.

📖

termer

Simulated experiences

Artificially generated samples by the internal environment model to accelerate learning without requiring additional real interactions.

📖

termer

Value update

Iterative process of adjusting action-value estimates Q(s,a) based on observed rewards and the values of future states according to Bellman's equation.

📖

termer

Experience replay buffer

Data structure storing tuples (state, action, reward, next_state) to allow repeated updates during the planning phase.

📖

termer

Dyna-Q+

Extension of Dyna-Q incorporating an exploration mechanism based on the time elapsed since the last state-action pair visit to detect and adapt to environmental changes.

📖

termer

Prioritized sweeping

Variant of Dyna-Q where updates are prioritized based on their potential impact on values, optimizing the computational efficiency of the planning phase.

📖

termer

Planning effect

Acceleration of learning observed when the number of planning steps per real step increases, up to a point of diminishing returns.

📖

termer

Algorithm convergence

Property guaranteeing that Dyna-Q's value estimates converge to the optimal values under certain conditions of an exact model and infinite visits.

📖

termer

Model error

Discrepancy between the actual behavior of the environment and the predictions of the learned model, which can degrade performance if not managed.

📖

termer

Computational complexity

Computational cost of Dyna-Q, depending linearly on the size of the experience replay buffer and the number of planning updates per iteration.

📖

termer

Model generalization

Ability to extrapolate the model's predictions to unseen state-actions, often implemented using neural networks or other function approximators.

📖

termer

State space sampling

Strategy for selecting simulated experiences from memory during the planning phase, influencing the learning efficiency of Dyna-Q.

📖

termer

Planning function

Algorithmic component that performs repeated updates on stored experiences to refine value estimates without new environmental interaction.

📖

termer

Adaptive learning rate

Mechanism for dynamically adjusting the learning rate in Dyna-Q to optimize convergence considering the variance of real and simulated experiences.

AI-ordlista

Model-Based Reinforcement Learning

Dyna-Q

Direct learning

Planning in reinforcement learning

Transition model

Reward model

Simulated experiences

Value update

Experience replay buffer

Dyna-Q+

Prioritized sweeping

Planning effect

Algorithm convergence

Model error

Computational complexity

Model generalization

State space sampling

Planning function

Adaptive learning rate

Inga resultat hittades