AI 용어집
인공지능 완전 사전
Epsilon exploration rate
Control parameter in the epsilon-greedy algorithm determining the probability of exploration rather than exploitation. Its value directly influences the convergence speed and final quality of the learned policy.
Greedy action
Action selected with the highest estimated value according to the agent's current knowledge. In epsilon-greedy, this action is chosen with probability 1-ε during the exploitation phase.
Random exploration
Process consisting of selecting an action uniformly at random from all available actions. In epsilon-greedy, this strategy is applied with probability ε to discover new potentially rewarding options.
Epsilon decay
Technique where the epsilon value gradually decreases over time to favor initial exploration and final exploitation. This approach enables more stable convergence toward an optimal policy.
Optimistic epsilon-greedy
Variant of the algorithm initializing action values with high optimistic estimates to encourage initial exploration. This method forces the agent to test all actions at least once.
Cumulative regret
Performance measure quantifying the difference between the sum of possible optimal rewards and those actually obtained by the algorithm. It serves as an indicator to evaluate the efficiency of the learning policy.
Algorithm convergence
Property guaranteeing that the epsilon-greedy algorithm converges to the optimal policy under certain conditions. Convergence depends on appropriate epsilon decay and a sufficient number of iterations.
Value initialization
Process of assigning initial values to reward estimates for each action at the beginning of learning. The initialization strategy significantly influences the agent's initial exploratory behavior.
Pure greedy policy
Strategy where epsilon = 0, resulting in systematic exploitation of the currently deemed optimal action without any exploration. This policy may prematurely converge to a local optimum.
Epsilon annealing
Technique for gradual and controlled reduction of the epsilon value during learning. Annealing enables a smooth transition from exploration to exploitation to improve convergence.