UCB Algorithms - 인공지능 용어집

📖

용어

UCB1

Algorithme UCB de base utilisant l'inégalité de Hoeffding pour calculer les bornes de confiance, offrant un regret logarithmique garanti dans le cadre des bandits stationnaires.

📖

용어

UCB1-Tuned

Variante améliorée de UCB1 qui adapte dynamiquement les bornes de confiance en fonction de la variance observée des récompenses pour optimiser l'exploration.

📖

용어

UCB-V

Algorithme UCB utilisant explicitement les estimations de variance pour construire des bornes de confiance plus précises, particulièrement efficace avec des récompenses à forte variance.

📖

용어

UCB-alpha

Généralisation paramétrique de UCB1 où le paramètre alpha contrôle l'agressivité de l'exploration, permettant d'ajuster le compromis exploration-exploitation selon les besoins.

📖

용어

MOSS (Minimax Optimal Strategy)

Algorithme UCB optimal au sens minimax qui atteint le regret minimal dans le pire cas en adaptant les bornes de confiance au nombre total d'itérations restantes.

📖

용어

KL-UCB

Variante de UCB utilisant la divergence de Kullback-Leibler pour construire des bornes de confiance asymptotiquement optimales, particulièrement adaptée aux récompenses bornées.

📖

용어

Regret logarithmique

Mesure de performance des algorithmes UCB où le regret cumulé croît logarithmiquement avec le temps, caractérisant l'optimalité dans les problèmes de bandits stationnaires.

📖

용어

Optimisme face à l'incertitude

Principe philosophique et mathématique guidant les algorithmes UCB, postulant que les actions incertaines méritent d'être explorées car elles pourraient être optimales.

📖

용어

Confidence Index

Index quantifying the level of statistical certainty on an action estimation, used to weight exploration in advanced variants of UCB algorithms.

📖

용어

Asymptotic Optimality

Theoretical property guaranteeing that a UCB algorithm asymptotically achieves the lowest possible regret bound, characterizing its long-term efficiency.

📖

용어

UCB-Normal

UCB variant specifically designed for rewards following a normal distribution, leveraging exact properties of the Gaussian distribution for optimal bounds.

📖

용어

Efficient UCB

Family of UCB algorithms achieving optimal computational complexity while preserving logarithmic regret guarantees for large-scale problems.

📖

용어

Upper Confidence Trees (UCT)

Application of the UCB principle to search trees for sequential decision-making, foundation of many game algorithms such as AlphaGo using Monte Carlo Tree Search.

AI 용어집