Słownik AI
Kompletny słownik sztucznej inteligencji
Monte Carlo Tree Search
Heuristic decision tree exploration algorithm using random simulations to evaluate future actions in a modeled environment. Combines tree learning and Monte Carlo evaluation to make optimal decisions.
Expansion
Phase of MCTS where a new child node is added to the search tree from a selected non-terminal node. Allows exploring new decision branches in the state space.
Simulation
Process of random simulation from a tree node to a terminal state to evaluate the quality of an action. Uses a default policy to generate a complete trajectory and estimate the reward.
Backpropagation
Phase of updating node statistics along the traversed path after a simulation. Propagates obtained rewards to parent nodes to refine value estimates.
Selection
First phase of MCTS where a node is chosen for expansion by following a selection policy in the tree. Typically uses the UCB algorithm to balance exploration and exploitation.
UCB1
Selection formula in MCTS that combines a node's average value with an exploration term based on the number of visits. Guarantees asymptotic convergence to optimal actions.
Root node
Starting point of the MCTS search tree representing the current state of the problem. All simulations and decisions emanate from this central node.
Search tree
Hierarchical data structure explored by MCTS containing possible states and actions. Builds dynamically during the search to represent the explored decision space.
Default Policy
Strategy used during the simulation phase to select actions when information is limited. Typically a random policy or simple heuristic to complete trajectories.
Q-value
Estimation of the quality of an action in a given state, calculated as the average of accumulated rewards. Serves as the main metric for evaluating nodes in MCTS.
Visits
Counter of the number of times a node has been selected and explored during the search. Influences the exploration term in UCB and the confidence in value estimations.
Heuristic
Domain-specific knowledge used to guide the search in MCTS and improve efficiency. Can influence selection, expansion, or simulation depending on the implementation.
Planning
Process of building a sequence of optimal actions by anticipating future states. MCTS excels in sequential planning under uncertainty through repeated simulations.
Transition Model
Function that predicts the next state given a current state and action in a model-based environment. Essential for simulations in MCTS without real interaction.
State
Complete representation of the system's situation at a given moment. Serves as a node in the MCTS tree and the basis for all decisions and simulations.
Action
Possible decision or movement from a given state in the environment. Represented by the edges connecting nodes in the MCTS search tree.
Récompense
Signal numérique retourné par l'environnement après une action, évaluant sa qualité. Cumulée durant les simulations pour estimer la valeur des actions et guider la sélection.
Horizon temporel
Profondeur maximale des simulations ou nombre d'étapes futures considérées dans la planification. Influence la qualité des décisions et le temps de calcul nécessaire.
Convergence
Propriété de MCTS garantissant que la valeur estimée des actions tend vers la valeur optimale avec un nombre infini de simulations. Assure la fiabilité asymptotique de l'algorithme.