Monte Carlo Tree Search Planning

📖

pojęcia

Monte Carlo Tree Search

Heuristic decision tree exploration algorithm using random simulations to evaluate future actions in a modeled environment. Combines tree learning and Monte Carlo evaluation to make optimal decisions.

📖

pojęcia

Expansion

Phase of MCTS where a new child node is added to the search tree from a selected non-terminal node. Allows exploring new decision branches in the state space.

📖

pojęcia

Simulation

Process of random simulation from a tree node to a terminal state to evaluate the quality of an action. Uses a default policy to generate a complete trajectory and estimate the reward.

📖

pojęcia

Backpropagation

Phase of updating node statistics along the traversed path after a simulation. Propagates obtained rewards to parent nodes to refine value estimates.

📖

pojęcia

Selection

First phase of MCTS where a node is chosen for expansion by following a selection policy in the tree. Typically uses the UCB algorithm to balance exploration and exploitation.

📖

pojęcia

UCB1

Selection formula in MCTS that combines a node's average value with an exploration term based on the number of visits. Guarantees asymptotic convergence to optimal actions.

📖

pojęcia

Root node

Starting point of the MCTS search tree representing the current state of the problem. All simulations and decisions emanate from this central node.

📖

pojęcia

Search tree

Hierarchical data structure explored by MCTS containing possible states and actions. Builds dynamically during the search to represent the explored decision space.

📖

pojęcia

Default Policy

Strategy used during the simulation phase to select actions when information is limited. Typically a random policy or simple heuristic to complete trajectories.

📖

pojęcia

Q-value

Estimation of the quality of an action in a given state, calculated as the average of accumulated rewards. Serves as the main metric for evaluating nodes in MCTS.

📖

pojęcia

Visits

Counter of the number of times a node has been selected and explored during the search. Influences the exploration term in UCB and the confidence in value estimations.

📖

pojęcia

Heuristic

Domain-specific knowledge used to guide the search in MCTS and improve efficiency. Can influence selection, expansion, or simulation depending on the implementation.

📖

pojęcia

Planning

Process of building a sequence of optimal actions by anticipating future states. MCTS excels in sequential planning under uncertainty through repeated simulations.

📖

pojęcia

Transition Model

Function that predicts the next state given a current state and action in a model-based environment. Essential for simulations in MCTS without real interaction.

📖

pojęcia

State

Complete representation of the system's situation at a given moment. Serves as a node in the MCTS tree and the basis for all decisions and simulations.

📖

pojęcia

Action

Possible decision or movement from a given state in the environment. Represented by the edges connecting nodes in the MCTS search tree.

📖

pojęcia

Récompense

Signal numérique retourné par l'environnement après une action, évaluant sa qualité. Cumulée durant les simulations pour estimer la valeur des actions et guider la sélection.

📖

pojęcia

Horizon temporel

Profondeur maximale des simulations ou nombre d'étapes futures considérées dans la planification. Influence la qualité des décisions et le temps de calcul nécessaire.

📖

pojęcia

Convergence

Propriété de MCTS garantissant que la valeur estimée des actions tend vers la valeur optimale avec un nombre infini de simulations. Assure la fiabilité asymptotique de l'algorithme.

Słownik AI