🏠 Startseite
Vergleiche
📊 Alle Benchmarks 🦖 Dinosaurier v1 🦖 Dinosaurier v2 ✅ To-Do-Listen-Apps 🎨 Kreative freie Seiten 🎯 FSACB - Ultimatives Showcase 🌍 Übersetzungs-Benchmark
Modelle
🏆 Top 10 Modelle 🆓 Kostenlose Modelle 📋 Alle Modelle ⚙️ Kilo Code
Ressourcen
💬 Prompt-Bibliothek 📖 KI-Glossar 🔗 Nützliche Links

KI-Glossar

Das vollständige Wörterbuch der Künstlichen Intelligenz

162
Kategorien
2.032
Unterkategorien
23.060
Begriffe
📖
Begriffe

Policy Gradient

Direct optimization method that adjusts policy parameters by following the gradient of the expected return, enabling learning of stochastic policies without requiring an environment model.

📖
Begriffe

REINFORCE Algorithm

Basic policy gradient algorithm using a Monte Carlo estimate of the gradient to update policy parameters based on fully observed episodes.

📖
Begriffe

Actor-Critic Methods

Hybrid approach combining an actor that learns the policy and a critic that estimates the value function, reducing the variance of policy gradient estimates.

📖
Begriffe

Advantage Function

Measure of the superiority of an action compared to the average of actions in a given state, calculated as the difference between the Q function and the V function to reduce gradient variance.

📖
Begriffe

Proximal Policy Optimization (PPO)

Algorithm optimizing the policy by constraining updates to stay close to the previous policy, using a clipped objective function to ensure learning stability.

📖
Begriffe

Trust Region Policy Optimization (TRPO)

Method ensuring monotonic performance improvements by optimizing the policy within a trust region defined by the KL divergence between successive policies.

📖
Begriffe

Natural Policy Gradient

Variant of policy gradient using the Fisher metric to perform parameterization-invariant updates, ensuring more stable and efficient convergence.

📖
Begriffe

Policy Network

Parameterized neural network that represents the policy π(a|s; θ), generating a probability distribution over actions conditioned on the current state.

📖
Begriffe

Monte Carlo Policy Gradient

A gradient estimation technique that uses complete trajectories to calculate returns, providing an unbiased but high-variance estimate.

📖
Begriffe

Baseline Function

A function subtracted from the return to reduce the variance of the gradient estimate without introducing bias, typically the state-value function.

📖
Begriffe

Importance Sampling

A technique that allows using data collected with an old policy to update a new policy, by weighting samples according to the probability ratio of the policies.

📖
Begriffe

Entropy Regularization

Adding an entropy term to the objective function to encourage exploration by penalizing overly deterministic policies, improving the robustness of learning.

📖
Begriffe

Deterministic Policy Gradient

An extension of policy gradient to continuous action spaces where the policy is deterministic, particularly effective in high-dimensional environments.

📖
Begriffe

Stochastic Policy

A policy represented by a probability distribution π(a|s) over actions, allowing for intrinsic exploration and is essential for policy gradient methods.

📖
Begriffe

KL Divergence Constraint

A constraint that limits the Kullback-Leibler divergence between successive policies to ensure stable updates and avoid overly drastic changes in behavior.

📖
Begriffe

Generalized Advantage Estimation (GAE)

An advantage estimation method that combines bias and variance through a weighted average of multi-step estimators, offering an optimal trade-off for learning.

📖
Begriffe

Policy Gradient Theorem

Fundamental theorem providing an analytical expression of the gradient of the expected return with respect to the policy parameters, formulating the theoretical basis of the methods.

📖
Begriffe

Return-to-Go

Sum of discounted future rewards from a given time step, used as a gradient estimator in policy gradient algorithms.

🔍

Keine Ergebnisse gefunden