🏠 Strona Główna
Benchmarki
📊 Wszystkie benchmarki 🦖 Dinozaur v1 🦖 Dinozaur v2 ✅ Aplikacje To-Do List 🎨 Kreatywne wolne strony 🎯 FSACB - Ostateczny pokaz 🌍 Benchmark tłumaczeń
Modele
🏆 Top 10 modeli 🆓 Darmowe modele 📋 Wszystkie modele ⚙️ Kilo Code
Zasoby
💬 Biblioteka promptów 📖 Słownik AI 🔗 Przydatne linki

Słownik AI

Kompletny słownik sztucznej inteligencji

162
kategorie
2 032
podkategorie
23 060
pojęcia
📖
pojęcia

Hierarchical Reinforcement Learning (HRL)

Reinforcement learning paradigm structuring policies into multiple hierarchical levels where meta-policies control specialized sub-policies to solve complex tasks in a modular manner.

📖
pojęcia

Options Framework

Formalism introduced by Sutton et al. generalizing atomic actions into temporal options consisting of a policy, an initiation condition, and an intra-temporal termination condition.

📖
pojęcia

Meta-controller

High-level policy in HRL responsible for selecting and activating appropriate sub-policies based on global objectives and the current state of the environment.

📖
pojęcia

Sub-controller

Low-level policy executing primitive actions or specific skills under the supervision of the meta-controller to accomplish localized sub-tasks.

📖
pojęcia

Temporal Abstraction

Fundamental principle in HRL allowing to group action sequences into coherent temporal units (options) to reduce the temporal complexity of learning.

📖
pojęcia

Feudal Reinforcement Learning

Hierarchical architecture inspired by feudal systems where high-level managers define goals for low-level workers who locally optimize their rewards.

📖
pojęcia

MAXQ Framework

HRL approach decomposing the value of a hierarchical policy into additive contributions of sub-tasks, allowing for automatic and reusable problem decomposition.

📖
pojęcia

Goal-conditioned Policies

Policies parameterized by specific goals allowing agents to learn generalizable behaviors that can be reused for different sub-objectives.

📖
pojęcia

Intrinsic Motivation

Mechanism generating internal rewards based on novelty, curiosity or mastery to guide autonomous exploration of hierarchical skills.

📖
pojęcia

Skill Discovery

Automatic process of identifying and extracting reusable behaviors (skills) from interaction with the environment without explicit external supervision.

📖
pojęcia

Hierarchical Actor-Critic (HAC)

HRL architecture combining multi-level actor-critics where each level simultaneously learns a policy and a value function for its respective time horizon.

📖
pojęcia

Hierarchical Deep Q-Network (hDQN)

Hierarchical extension of DQN using separate value networks for high and low-level policies, with pre-trained options as abstract actions.

📖
pojęcia

State Abstraction

Technique reducing state dimensionality by grouping similar observations relevant for each hierarchical level, improving learning efficiency.

📖
pojęcia

Termination Function

Function determining when an option should stop and return control to the upper level, crucial for temporal coordination between hierarchical levels.

📖
pojęcia

Initiation Function

Function defining the conditions under which an option can be activated, ensuring that sub-policies only execute in appropriate states.

📖
pojęcia

Policy over Options

High-level policy that selects among available options rather than primitive actions, forming the decision core of HRL systems.

📖
pojęcia

Hindsight Experience Replay (HER)

Technique that augments past experiences by reinterpreting failures as successes for alternative goals, particularly effective in hierarchical frameworks.

📖
pojęcia

Subgoal Discovery

Process of automatically identifying relevant intermediate states that serve as natural transition points between hierarchical decision-making levels.

📖
pojęcia

Hierarchical Policy Gradient

Gradient optimization method adapted for hierarchical policies, propagating gradients through multiple decision levels simultaneously.

📖
pojęcia

Option-Critic Architecture

End-to-end framework that simultaneously learns intra-option policies, terminations, and policies over options using gradient descent.

🔍

Nie znaleziono wyników