KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Model-Based Inverse Reinforcement Learning
An approach that infers a reward function from expert demonstrations using an environment model to generate and evaluate plausible alternative trajectories.
Reward Function Inference
The process of estimating an agent's underlying reward function by observing its behavior, often formulated as a likelihood optimization problem of demonstrated trajectories.
Environmental Dynamics Model
A learned model that predicts the next state and reward given a current state and action, used to simulate trajectories in model-based reinforcement learning.
Plausible Trajectory Generation
The use of an environment model to create state-action sequences that are consistent with system dynamics and observed policies, serving as synthetic data for inference.
Imitation Bias
Tendency of an agent learned through inverse reinforcement learning to over-imitate demonstrated actions without generalizing to unseen states, requiring the use of models to explore beyond expert data.
Trajectory Likelihood Optimization
Method of adjusting the reward function to maximize the probability that observed expert trajectories are optimal under the inferred reward.
Ambiguous Reward Function
Problem where multiple different reward functions can equally explain the same expert demonstrations, requiring constraints or priors to resolve the ambiguity.
Synthetic Trajectory Set
Collection of trajectories generated by the environment model, used to enrich demonstration data and improve the robustness of reward inference.
Environmental Model Error
Discrepancy between the actual environment dynamics and those predicted by the learned model, which can bias reward inference if not corrected.
Backpropagation through Model
Technique for computing gradients of the reward function with respect to its parameters by propagating the error through the differentiable dynamics model.
Policy Space
Set of all possible policies π(a|s) that the agent can adopt, in which inverse reinforcement learning seeks to identify the optimal policy compatible with demonstrations.
Model-Based Monte Carlo Planning
Method using stochastic simulations of the environmental model to evaluate different candidate reward functions and select the one that best explains demonstrations.
Regularization Cost Function
Term added to the inference objective to penalize complex or unrealistic reward functions, favoring simpler and more generalizable solutions.
Posterior Distribution over Rewards
Bayesian approach that maintains a probability distribution over possible reward functions rather than a point estimate, allowing to quantify uncertainty.
Simulation Horizon
Maximum number of future steps simulated by the environmental model when generating trajectories, influencing the balance between exploration and computational cost.
Model-Based Importance Sampling
Technique using the model to generate trajectories from a proposal distribution, then weighting them by their likelihood under the expert policy.
Maximum Entropy Method
Inference principle that chooses the least informative (maximum entropy) reward function among those that explain the demonstrations, avoiding overfitting.