Reinforcement Learning for Optimization
Multi-Armed Bandit Problem
Sequential optimization problem where an agent must choose among several options with unknown rewards to maximize cumulative reward over time.
← Zurück