🏠 Home
Prestatietests
📊 Alle benchmarks 🦖 Dinosaur v1 🦖 Dinosaur v2 ✅ To-Do List applicaties 🎨 Creatieve vrije pagina's 🎯 FSACB - Ultieme showcase 🌍 Vertaalbenchmark
Modellen
🏆 Top 10 modellen 🆓 Gratis modellen 📋 Alle modellen ⚙️ Kilo Code
Bronnen
💬 Promptbibliotheek 📖 AI-woordenlijst 🔗 Nuttige links

AI-woordenlijst

Het complete woordenboek van kunstmatige intelligentie

162
categorieën
2.032
subcategorieën
23.060
termen
📖
termen

Policy Gradient

Direct optimization method that adjusts policy parameters by following the gradient of the expected return, enabling learning of stochastic policies without requiring an environment model.

📖
termen

REINFORCE Algorithm

Basic policy gradient algorithm using a Monte Carlo estimate of the gradient to update policy parameters based on fully observed episodes.

📖
termen

Actor-Critic Methods

Hybrid approach combining an actor that learns the policy and a critic that estimates the value function, reducing the variance of policy gradient estimates.

📖
termen

Advantage Function

Measure of the superiority of an action compared to the average of actions in a given state, calculated as the difference between the Q function and the V function to reduce gradient variance.

📖
termen

Proximal Policy Optimization (PPO)

Algorithm optimizing the policy by constraining updates to stay close to the previous policy, using a clipped objective function to ensure learning stability.

📖
termen

Trust Region Policy Optimization (TRPO)

Method ensuring monotonic performance improvements by optimizing the policy within a trust region defined by the KL divergence between successive policies.

📖
termen

Natural Policy Gradient

Variant of policy gradient using the Fisher metric to perform parameterization-invariant updates, ensuring more stable and efficient convergence.

📖
termen

Policy Network

Parameterized neural network that represents the policy π(a|s; θ), generating a probability distribution over actions conditioned on the current state.

📖
termen

Monte Carlo Policy Gradient

A gradient estimation technique that uses complete trajectories to calculate returns, providing an unbiased but high-variance estimate.

📖
termen

Baseline Function

A function subtracted from the return to reduce the variance of the gradient estimate without introducing bias, typically the state-value function.

📖
termen

Importance Sampling

A technique that allows using data collected with an old policy to update a new policy, by weighting samples according to the probability ratio of the policies.

📖
termen

Entropy Regularization

Adding an entropy term to the objective function to encourage exploration by penalizing overly deterministic policies, improving the robustness of learning.

📖
termen

Deterministic Policy Gradient

An extension of policy gradient to continuous action spaces where the policy is deterministic, particularly effective in high-dimensional environments.

📖
termen

Stochastic Policy

A policy represented by a probability distribution π(a|s) over actions, allowing for intrinsic exploration and is essential for policy gradient methods.

📖
termen

KL Divergence Constraint

A constraint that limits the Kullback-Leibler divergence between successive policies to ensure stable updates and avoid overly drastic changes in behavior.

📖
termen

Generalized Advantage Estimation (GAE)

An advantage estimation method that combines bias and variance through a weighted average of multi-step estimators, offering an optimal trade-off for learning.

📖
termen

Policy Gradient Theorem

Fundamental theorem providing an analytical expression of the gradient of the expected return with respect to the policy parameters, formulating the theoretical basis of the methods.

📖
termen

Return-to-Go

Sum of discounted future rewards from a given time step, used as a gradient estimator in policy gradient algorithms.

🔍

Geen resultaten gevonden