Reinforcement Learning for Optimization
Epsilon-Greedy Policy
Action selection strategy where with probability ε the agent explores (chooses a random action) and with probability 1-ε it exploits (chooses the best known action).
← Indietro