Online Optimization
Bandit Algorithm
Family of online learning algorithms where the agent must sequentially select actions with uncertain rewards to maximize cumulative gain.
← 뒤로Family of online learning algorithms where the agent must sequentially select actions with uncertain rewards to maximize cumulative gain.
← 뒤로