AI-ordlista
Den kompletta ordlistan över AI
Model-Based Reinforcement Learning
Reinforcement learning approach where the agent builds an internal model of the environment to simulate transitions and generate experiences without real interaction.
Dyna-Q
Hybrid reinforcement learning algorithm combining direct learning from real experience and planning using a learned model to generate additional simulated experiences.
Direct learning
Process of updating action values or policy based solely on real experiences accumulated during interaction with the environment.
Planning in reinforcement learning
Using an environmental model to generate synthetic experiences and improve the policy without additional interactions with the real environment.
Transition model
Component of the predictive environment model that estimates the probability distribution of next states given a current state and an action.
Reward model
Learned function that predicts the expected reward for each state-action pair in a reinforcement learning environment.
Simulated experiences
Artificially generated samples by the internal environment model to accelerate learning without requiring additional real interactions.
Value update
Iterative process of adjusting action-value estimates Q(s,a) based on observed rewards and the values of future states according to Bellman's equation.
Experience replay buffer
Data structure storing tuples (state, action, reward, next_state) to allow repeated updates during the planning phase.
Dyna-Q+
Extension of Dyna-Q incorporating an exploration mechanism based on the time elapsed since the last state-action pair visit to detect and adapt to environmental changes.
Prioritized sweeping
Variant of Dyna-Q where updates are prioritized based on their potential impact on values, optimizing the computational efficiency of the planning phase.
Planning effect
Acceleration of learning observed when the number of planning steps per real step increases, up to a point of diminishing returns.
Algorithm convergence
Property guaranteeing that Dyna-Q's value estimates converge to the optimal values under certain conditions of an exact model and infinite visits.
Model error
Discrepancy between the actual behavior of the environment and the predictions of the learned model, which can degrade performance if not managed.
Computational complexity
Computational cost of Dyna-Q, depending linearly on the size of the experience replay buffer and the number of planning updates per iteration.
Model generalization
Ability to extrapolate the model's predictions to unseen state-actions, often implemented using neural networks or other function approximators.
State space sampling
Strategy for selecting simulated experiences from memory during the planning phase, influencing the learning efficiency of Dyna-Q.
Planning function
Algorithmic component that performs repeated updates on stored experiences to refine value estimates without new environmental interaction.
Adaptive learning rate
Mechanism for dynamically adjusting the learning rate in Dyna-Q to optimize convergence considering the variance of real and simulated experiences.