Contextual Bandits
Action-Value Function
Function Q(a,x) that estimates the expected future reward by taking action 'a' in context 'x', fundamental for policy evaluation.
← WsteczFunction Q(a,x) that estimates the expected future reward by taking action 'a' in context 'x', fundamental for policy evaluation.
← Wstecz