Model-Based Inverse Reinforcement Learning

📖

istilah

Model-Based Inverse Reinforcement Learning

An approach that infers a reward function from expert demonstrations using an environment model to generate and evaluate plausible alternative trajectories.

📖

istilah

Reward Function Inference

The process of estimating an agent's underlying reward function by observing its behavior, often formulated as a likelihood optimization problem of demonstrated trajectories.

📖

istilah

Environmental Dynamics Model

A learned model that predicts the next state and reward given a current state and action, used to simulate trajectories in model-based reinforcement learning.

📖

istilah

Plausible Trajectory Generation

The use of an environment model to create state-action sequences that are consistent with system dynamics and observed policies, serving as synthetic data for inference.

📖

istilah

Tendency of an agent learned through inverse reinforcement learning to over-imitate demonstrated actions without generalizing to unseen states, requiring the use of models to explore beyond expert data.

📖

istilah

Trajectory Likelihood Optimization

Method of adjusting the reward function to maximize the probability that observed expert trajectories are optimal under the inferred reward.

📖

istilah

Ambiguous Reward Function

Problem where multiple different reward functions can equally explain the same expert demonstrations, requiring constraints or priors to resolve the ambiguity.

📖

istilah

Synthetic Trajectory Set

Collection of trajectories generated by the environment model, used to enrich demonstration data and improve the robustness of reward inference.

📖

istilah

Environmental Model Error

Discrepancy between the actual environment dynamics and those predicted by the learned model, which can bias reward inference if not corrected.

📖

istilah

Backpropagation through Model

Technique for computing gradients of the reward function with respect to its parameters by propagating the error through the differentiable dynamics model.

📖

istilah

Policy Space

Set of all possible policies π(a|s) that the agent can adopt, in which inverse reinforcement learning seeks to identify the optimal policy compatible with demonstrations.

📖

istilah

Model-Based Monte Carlo Planning

Method using stochastic simulations of the environmental model to evaluate different candidate reward functions and select the one that best explains demonstrations.

📖

istilah

Regularization Cost Function

Term added to the inference objective to penalize complex or unrealistic reward functions, favoring simpler and more generalizable solutions.

📖

istilah

Posterior Distribution over Rewards

Bayesian approach that maintains a probability distribution over possible reward functions rather than a point estimate, allowing to quantify uncertainty.

📖

istilah

Simulation Horizon

Maximum number of future steps simulated by the environmental model when generating trajectories, influencing the balance between exploration and computational cost.

📖

istilah

Model-Based Importance Sampling

Technique using the model to generate trajectories from a proposal distribution, then weighting them by their likelihood under the expert policy.

📖

istilah

Maximum Entropy Method

Inference principle that chooses the least informative (maximum entropy) reward function among those that explain the demonstrations, avoiding overfitting.

Glosarium AI