🏠 Strona Główna
Benchmarki
📊 Wszystkie benchmarki 🦖 Dinozaur v1 🦖 Dinozaur v2 ✅ Aplikacje To-Do List 🎨 Kreatywne wolne strony 🎯 FSACB - Ostateczny pokaz 🌍 Benchmark tłumaczeń
Modele
🏆 Top 10 modeli 🆓 Darmowe modele 📋 Wszystkie modele ⚙️ Kilo Code
Zasoby
💬 Biblioteka promptów 📖 Słownik AI 🔗 Przydatne linki

Słownik AI

Kompletny słownik sztucznej inteligencji

162
kategorie
2 032
podkategorie
23 060
pojęcia
📖
pojęcia

Policy Gradient

Direct optimization method that adjusts policy parameters by following the gradient of the expected return, enabling learning of stochastic policies without requiring an environment model.

📖
pojęcia

REINFORCE Algorithm

Basic policy gradient algorithm using a Monte Carlo estimate of the gradient to update policy parameters based on fully observed episodes.

📖
pojęcia

Actor-Critic Methods

Hybrid approach combining an actor that learns the policy and a critic that estimates the value function, reducing the variance of policy gradient estimates.

📖
pojęcia

Advantage Function

Measure of the superiority of an action compared to the average of actions in a given state, calculated as the difference between the Q function and the V function to reduce gradient variance.

📖
pojęcia

Proximal Policy Optimization (PPO)

Algorithm optimizing the policy by constraining updates to stay close to the previous policy, using a clipped objective function to ensure learning stability.

📖
pojęcia

Trust Region Policy Optimization (TRPO)

Method ensuring monotonic performance improvements by optimizing the policy within a trust region defined by the KL divergence between successive policies.

📖
pojęcia

Natural Policy Gradient

Variant of policy gradient using the Fisher metric to perform parameterization-invariant updates, ensuring more stable and efficient convergence.

📖
pojęcia

Policy Network

Parameterized neural network that represents the policy π(a|s; θ), generating a probability distribution over actions conditioned on the current state.

📖
pojęcia

Monte Carlo Policy Gradient

A gradient estimation technique that uses complete trajectories to calculate returns, providing an unbiased but high-variance estimate.

📖
pojęcia

Baseline Function

A function subtracted from the return to reduce the variance of the gradient estimate without introducing bias, typically the state-value function.

📖
pojęcia

Importance Sampling

A technique that allows using data collected with an old policy to update a new policy, by weighting samples according to the probability ratio of the policies.

📖
pojęcia

Entropy Regularization

Adding an entropy term to the objective function to encourage exploration by penalizing overly deterministic policies, improving the robustness of learning.

📖
pojęcia

Deterministic Policy Gradient

An extension of policy gradient to continuous action spaces where the policy is deterministic, particularly effective in high-dimensional environments.

📖
pojęcia

Stochastic Policy

A policy represented by a probability distribution π(a|s) over actions, allowing for intrinsic exploration and is essential for policy gradient methods.

📖
pojęcia

KL Divergence Constraint

A constraint that limits the Kullback-Leibler divergence between successive policies to ensure stable updates and avoid overly drastic changes in behavior.

📖
pojęcia

Generalized Advantage Estimation (GAE)

An advantage estimation method that combines bias and variance through a weighted average of multi-step estimators, offering an optimal trade-off for learning.

📖
pojęcia

Policy Gradient Theorem

Fundamental theorem providing an analytical expression of the gradient of the expected return with respect to the policy parameters, formulating the theoretical basis of the methods.

📖
pojęcia

Return-to-Go

Sum of discounted future rewards from a given time step, used as a gradient estimator in policy gradient algorithms.

🔍

Nie znaleziono wyników