Epsilon-Greedy Algorithms

📖

Begriffe

Epsilon exploration rate

Control parameter in the epsilon-greedy algorithm determining the probability of exploration rather than exploitation. Its value directly influences the convergence speed and final quality of the learned policy.

📖

Begriffe

Greedy action

Action selected with the highest estimated value according to the agent's current knowledge. In epsilon-greedy, this action is chosen with probability 1-ε during the exploitation phase.

📖

Begriffe

Random exploration

Process consisting of selecting an action uniformly at random from all available actions. In epsilon-greedy, this strategy is applied with probability ε to discover new potentially rewarding options.

📖

Begriffe

Epsilon decay

Technique where the epsilon value gradually decreases over time to favor initial exploration and final exploitation. This approach enables more stable convergence toward an optimal policy.

📖

Begriffe

Optimistic epsilon-greedy

Variant of the algorithm initializing action values with high optimistic estimates to encourage initial exploration. This method forces the agent to test all actions at least once.

📖

Begriffe

Cumulative regret

Performance measure quantifying the difference between the sum of possible optimal rewards and those actually obtained by the algorithm. It serves as an indicator to evaluate the efficiency of the learning policy.

📖

Begriffe

Algorithm convergence

Property guaranteeing that the epsilon-greedy algorithm converges to the optimal policy under certain conditions. Convergence depends on appropriate epsilon decay and a sufficient number of iterations.

📖

Begriffe

Value initialization

Process of assigning initial values to reward estimates for each action at the beginning of learning. The initialization strategy significantly influences the agent's initial exploratory behavior.

📖

Begriffe

Pure greedy policy

Strategy where epsilon = 0, resulting in systematic exploitation of the currently deemed optimal action without any exploration. This policy may prematurely converge to a local optimum.

📖

Begriffe

Epsilon annealing

Technique for gradual and controlled reduction of the epsilon value during learning. Annealing enables a smooth transition from exploration to exploitation to improve convergence.

KI-Glossar

Epsilon exploration rate

Greedy action

Random exploration

Epsilon decay

Optimistic epsilon-greedy

Cumulative regret

Algorithm convergence

Value initialization

Pure greedy policy

Epsilon annealing

Keine Ergebnisse gefunden