Contextual Bandits
Regret Minimization
Objective aimed at minimizing the difference between the cumulative reward obtained and that of the optimal policy, measuring the performance of the algorithm.
← Back