Online Optimization
Bandit Algorithm
Family of online learning algorithms where the agent must sequentially select actions with uncertain rewards to maximize cumulative gain.
← Quay lạiFamily of online learning algorithms where the agent must sequentially select actions with uncertain rewards to maximize cumulative gain.
← Quay lại