Model-Based Imitation Learning
Trajectory Optimization
Class of planning algorithms that iteratively improve an entire trajectory using gradients from the reward and dynamic models, as opposed to value-based methods.
← 뒤로