Distributional Offline RL

📖

termini

Distributional Q-Learning

A variant of Q-learning that learns the complete probability distribution of Q-values instead of estimating only the mathematical expectation, allowing for better characterization of uncertainty.

📖

termini

Categorical Distributional RL

A method that represents the distribution of returns as a discrete set of probabilities over predefined value atoms, using categorical projection to ensure stability.

📖

termini

Quantile Regression RL

A distributional approach using quantile regression to directly model the quantiles of the return distribution, offering a flexible and continuous representation.

📖

termini

Implicit Quantile Network (IQN)

A neural network architecture that learns the cumulative distribution of returns by generating implicit quantiles via quantile embedding functions, enabling continuous estimation.

📖

termini

Distributional Bellman Operator

A generalization of the classical Bellman operator that acts on return distributions rather than scalar values, preserving the complete distribution structure.

📖

termini

Batch Constraint Distributional RL

An offline approach that applies batch constraints to distributional methods to ensure that policies remain close to the behavior observed in the training dataset.

📖

termini

Offline Distributional Critic

A critic module in offline learning that estimates the distribution of returns to evaluate actions, using techniques to handle distribution shift and selection bias.

📖

termini

Distributional Policy Gradient

An extension of policy gradient methods that directly optimizes the distribution parameters of returns rather than only their expectation, enabling fine control over risk properties.

📖

termini

Risk-Aware Distributional RL

Offline approach that uses the complete distribution of returns to make risk-aware decisions, optimizing according to metrics like CVaR or coherent risk measures.

📖

termini

Distributional Dynamics Model

Data-based model that captures not only the average dynamics of the environment but also the distribution of transitions and rewards, essential for robust offline learning.

📖

termini

Distributional Advantage Estimation

Advantage estimation technique considering the complete distribution of returns rather than only average values, allowing for a more nuanced evaluation of actions offline.

📖

termini

Conservative Distributional Learning

Offline learning paradigm that maintains conservative estimates of the return distribution to avoid overestimation due to distribution shift in offline data.

📖

termini

Distributional Sample Efficiency

Measure of how efficiently offline distributional methods learn from limited samples, leveraging the rich structure of distributional information.

Glossario IA

Distributional Q-Learning

Categorical Distributional RL

Quantile Regression RL

Implicit Quantile Network (IQN)

Distributional Bellman Operator

Batch Constraint Distributional RL

Offline Distributional Critic

Distributional Policy Gradient

Risk-Aware Distributional RL

Distributional Dynamics Model

Distributional Advantage Estimation

Conservative Distributional Learning

Distributional Sample Efficiency

Nessun risultato trovato