Online Optimization
Bandit Algorithm
Family of online learning algorithms where the agent must sequentially select actions with uncertain rewards to maximize cumulative gain.
← ZurückFamily of online learning algorithms where the agent must sequentially select actions with uncertain rewards to maximize cumulative gain.
← Zurück