AI 용어집
인공지능 완전 사전
Model-Based Imitation Learning
Approach where the agent first learns a dynamic model of the environment, then uses this model to plan and generalize behaviors imitated from expert demonstrations.
Dynamic Model
Mathematical representation learning the state transition of the environment, i.e., the probability P(s'|s, a) of reaching a new state s' by taking action a from state s.
Counterfactual Reasoning Inference
Method of inferring the expert's reward function by comparing demonstrated trajectories with close counterfactual trajectories to identify the expert's preferences.
Model-Based Planning
Process of using the learned dynamic and reward models to simulate different action sequences and select the optimal policy without direct interaction with the real environment.
Model-Based Generalization
Ability of a model-based agent to adapt imitated behaviors to new situations not seen in the demonstrations by simulating hypothetical scenarios using its environment model.
Inverse Reinforcement Learning (IRL)
Process of inferring the underlying reward function of an expert from their demonstrations, providing a dense signal for training the reinforcement learning agent.
Backpropagation Through Time (BPTT)
Algorithm used to train recurrent dynamic models, where the loss gradients are calculated by backpropagating errors through the time steps of the simulated trajectory.
Trajectory Optimization
Class of planning algorithms that iteratively improve an entire trajectory using gradients from the reward and dynamic models, as opposed to value-based methods.
Behavioral Imitation Learning (BC)
Supervised learning approach that directly models the expert policy π(a|s) by minimizing the error between the agent's actions and the expert's actions for given states.
Hybrid BC-Model-Based
Architecture combining a behavioral model for direct imitation and an environment model for planning, where both contributions are merged to produce the agent's final action.