Hierarchical Reinforcement Learning

📖

Begriffe

Hierarchical Reinforcement Learning (HRL)

Reinforcement learning paradigm structuring policies into multiple hierarchical levels where meta-policies control specialized sub-policies to solve complex tasks in a modular manner.

📖

Begriffe

Options Framework

Formalism introduced by Sutton et al. generalizing atomic actions into temporal options consisting of a policy, an initiation condition, and an intra-temporal termination condition.

📖

Begriffe

Meta-controller

High-level policy in HRL responsible for selecting and activating appropriate sub-policies based on global objectives and the current state of the environment.

📖

Begriffe

Sub-controller

Low-level policy executing primitive actions or specific skills under the supervision of the meta-controller to accomplish localized sub-tasks.

📖

Begriffe

Temporal Abstraction

Fundamental principle in HRL allowing to group action sequences into coherent temporal units (options) to reduce the temporal complexity of learning.

📖

Begriffe

Feudal Reinforcement Learning

Hierarchical architecture inspired by feudal systems where high-level managers define goals for low-level workers who locally optimize their rewards.

📖

Begriffe

MAXQ Framework

HRL approach decomposing the value of a hierarchical policy into additive contributions of sub-tasks, allowing for automatic and reusable problem decomposition.

📖

Begriffe

Goal-conditioned Policies

Policies parameterized by specific goals allowing agents to learn generalizable behaviors that can be reused for different sub-objectives.

📖

Begriffe

Intrinsic Motivation

Mechanism generating internal rewards based on novelty, curiosity or mastery to guide autonomous exploration of hierarchical skills.

📖

Begriffe

Skill Discovery

Automatic process of identifying and extracting reusable behaviors (skills) from interaction with the environment without explicit external supervision.

📖

Begriffe

Hierarchical Actor-Critic (HAC)

HRL architecture combining multi-level actor-critics where each level simultaneously learns a policy and a value function for its respective time horizon.

📖

Begriffe

Hierarchical Deep Q-Network (hDQN)

Hierarchical extension of DQN using separate value networks for high and low-level policies, with pre-trained options as abstract actions.

📖

Begriffe

State Abstraction

Technique reducing state dimensionality by grouping similar observations relevant for each hierarchical level, improving learning efficiency.

📖

Begriffe

Termination Function

Function determining when an option should stop and return control to the upper level, crucial for temporal coordination between hierarchical levels.

📖

Begriffe

Initiation Function

Function defining the conditions under which an option can be activated, ensuring that sub-policies only execute in appropriate states.

📖

Begriffe

Policy over Options

High-level policy that selects among available options rather than primitive actions, forming the decision core of HRL systems.

📖

Begriffe

Hindsight Experience Replay (HER)

Technique that augments past experiences by reinterpreting failures as successes for alternative goals, particularly effective in hierarchical frameworks.

📖

Begriffe

Subgoal Discovery

Process of automatically identifying relevant intermediate states that serve as natural transition points between hierarchical decision-making levels.

📖

Begriffe

Hierarchical Policy Gradient

Gradient optimization method adapted for hierarchical policies, propagating gradients through multiple decision levels simultaneously.

📖

Begriffe

Option-Critic Architecture

End-to-end framework that simultaneously learns intra-option policies, terminations, and policies over options using gradient descent.

KI-Glossar