AI 용어집
인공지능 완전 사전
Model-Based Offline RL
Offline reinforcement learning approach that learns a dynamic model of the environment to generate synthetic data and improve the policy without real interaction.
Imagination Rollouts
Simulated trajectories generated using the learned model of the environment to explore potential future states without real interaction with the environment.
Conservative Policy Optimization
Algorithm that explicitly penalizes policies that significantly deviate from the training data behavior to avoid extrapolation errors.
Uncertainty Quantification
Technique to estimate the uncertainty of the dynamic model in out-of-distribution regions to guide exploration and avoid catastrophic errors.
Ensemble Models
Collection of multiple dynamic models trained with different initializations to estimate epistemic uncertainty through prediction variance.
Trajectory Transformers
Transformer architecture that models trajectories as sequences of states, actions, and rewards to predict future transitions in offline learning.
Offline-to-Online Transfer
Process of transferring a policy learned offline to an online environment for refinement and continuous adaptation with real interaction.
Model Ensembling
Technique using multiple dynamic models to capture different hypotheses about state transition and improve prediction robustness.
Advantage Weighted Regression
Offline method that weights actions in training data according to their estimated advantage to improve policy beyond simple cloning.
Out-of-Distribution Detection
Mechanism to identify when states generated by the model significantly deviate from the original training data distribution.