Contextual Bandits
Policy Learning
Approach where the algorithm directly learns a policy function mapping contexts to optimal actions. Avoids explicit value estimation for more direct decision-making.
← Indietro