Online Optimization
Bandit Algorithm
Family of online learning algorithms where the agent must sequentially select actions with uncertain rewards to maximize cumulative gain.
← IndietroFamily of online learning algorithms where the agent must sequentially select actions with uncertain rewards to maximize cumulative gain.
← Indietro