Contextual Bandits
Bandits with Delayed Feedback
Variant of contextual bandits where the reward is only observed after a significant delay. Requires adapted algorithms to handle temporal uncertainty and maintain efficient learning.
← Tillbaka