AI Glossary
The complete dictionary of Artificial Intelligence
Inverse Reinforcement Learning
Learning method where the agent infers the reward function from expert demonstrations rather than receiving explicit rewards.
Maximum Entropy IRL
Variant of IRL that assumes the expert follows the maximum entropy probability distribution among all optimal policies.
Behavioral Cloning
Supervised learning approach that directly learns to imitate expert actions without explicitly inferring the reward function.
Expert Trajectory
Sequence of states and actions observed in an expert, representing an optimal or near-optimal solution to the problem.
Policy Equivalence
Principle that multiple reward functions can lead to the same optimal policy, creating ambiguity in IRL.
Bayesian Inverse Reinforcement Learning
IRL approach using Bayesian inference to estimate a distribution over possible reward functions.
Preference Cost
Transformation of the reward function into a cost function, where the agent learns to minimize total cost while following demonstrations.
Adversarial Inverse Reinforcement Learning
IRL method using an adversarial game where a generator learns the policy and a discriminator distinguishes expert trajectories.
Active Inverse Reinforcement Learning
Variant of IRL where the agent can query the expert to obtain additional demonstrations and reduce uncertainty.
Objective Function Inference
Mathematical process of deducing the underlying objective function from observations of the expert's behavior.
Imitation Bias
Tendency of the agent to over-imitate the expert's actions without understanding the underlying intention, leading to poor generalizations.
Reinforcement Learning with Expert Feedback
Combination of RL and IRL where a model first trains on expert data, then is refined with human feedback.
Feature Function
Function that maps state-action pairs to a feature space, used to represent the reward function in a linear manner.
Multi-task Inverse Reinforcement Learning
Extension of IRL where multiple tasks are learned simultaneously by sharing knowledge between reward functions.