🏠 Home
Benchmark
📊 Tutti i benchmark 🦖 Dinosauro v1 🦖 Dinosauro v2 ✅ App To-Do List 🎨 Pagine libere creative 🎯 FSACB - Ultimate Showcase 🌍 Benchmark traduzione
Modelli
🏆 Top 10 modelli 🆓 Modelli gratuiti 📋 Tutti i modelli ⚙️ Kilo Code
Risorse
💬 Libreria di prompt 📖 Glossario IA 🔗 Link utili

Glossario IA

Il dizionario completo dell'Intelligenza Artificiale

162
categorie
2.032
sottocategorie
23.060
termini
📖
termini

Policy Gradient

Direct optimization method that adjusts policy parameters by following the gradient of the expected return, enabling learning of stochastic policies without requiring an environment model.

📖
termini

REINFORCE Algorithm

Basic policy gradient algorithm using a Monte Carlo estimate of the gradient to update policy parameters based on fully observed episodes.

📖
termini

Actor-Critic Methods

Hybrid approach combining an actor that learns the policy and a critic that estimates the value function, reducing the variance of policy gradient estimates.

📖
termini

Advantage Function

Measure of the superiority of an action compared to the average of actions in a given state, calculated as the difference between the Q function and the V function to reduce gradient variance.

📖
termini

Proximal Policy Optimization (PPO)

Algorithm optimizing the policy by constraining updates to stay close to the previous policy, using a clipped objective function to ensure learning stability.

📖
termini

Trust Region Policy Optimization (TRPO)

Method ensuring monotonic performance improvements by optimizing the policy within a trust region defined by the KL divergence between successive policies.

📖
termini

Natural Policy Gradient

Variant of policy gradient using the Fisher metric to perform parameterization-invariant updates, ensuring more stable and efficient convergence.

📖
termini

Policy Network

Parameterized neural network that represents the policy π(a|s; θ), generating a probability distribution over actions conditioned on the current state.

📖
termini

Monte Carlo Policy Gradient

A gradient estimation technique that uses complete trajectories to calculate returns, providing an unbiased but high-variance estimate.

📖
termini

Baseline Function

A function subtracted from the return to reduce the variance of the gradient estimate without introducing bias, typically the state-value function.

📖
termini

Importance Sampling

A technique that allows using data collected with an old policy to update a new policy, by weighting samples according to the probability ratio of the policies.

📖
termini

Entropy Regularization

Adding an entropy term to the objective function to encourage exploration by penalizing overly deterministic policies, improving the robustness of learning.

📖
termini

Deterministic Policy Gradient

An extension of policy gradient to continuous action spaces where the policy is deterministic, particularly effective in high-dimensional environments.

📖
termini

Stochastic Policy

A policy represented by a probability distribution π(a|s) over actions, allowing for intrinsic exploration and is essential for policy gradient methods.

📖
termini

KL Divergence Constraint

A constraint that limits the Kullback-Leibler divergence between successive policies to ensure stable updates and avoid overly drastic changes in behavior.

📖
termini

Generalized Advantage Estimation (GAE)

An advantage estimation method that combines bias and variance through a weighted average of multi-step estimators, offering an optimal trade-off for learning.

📖
termini

Policy Gradient Theorem

Fundamental theorem providing an analytical expression of the gradient of the expected return with respect to the policy parameters, formulating the theoretical basis of the methods.

📖
termini

Return-to-Go

Sum of discounted future rewards from a given time step, used as a gradient estimator in policy gradient algorithms.

🔍

Nessun risultato trovato