Observation-based Imitation

📖

termer

Inverse Reinforcement Learning

Method that consists of deducing the reward function of an expert from its optimal trajectories, then allowing the agent to learn an optimal policy.

📖

termer

State-only Imitation Learning

Learning paradigm where the agent only has access to the states visited by the expert without knowledge of the actions taken, requiring specific approaches to infer behaviors.

📖

termer

Trajectory Matching

Approach that minimizes the divergence between the trajectory distributions generated by the agent and those of the expert, often used in learning without access to actions.

📖

termer

GAIL

Framework combining imitation learning and generative adversarial networks, where a discriminator distinguishes the trajectories of the expert from those of the agent.

📖

termer

Dataset Aggregation

Iterative algorithm that collects new expert data based on the errors of the current agent, progressively aggregating a more robust dataset.

📖

termer

Forward-Forward Algorithm

Unsupervised learning method that predicts future states from current states without requiring action data, used in imitation by observation.

📖

termer

Observation-based Learning

Learning process where the agent acquires skills by observing only environmental states and results, without direct access to the expert's actions.

📖

termer

State Distribution Matching

Technique aiming to align the distribution of states visited by the agent with that of the expert, used when actions are not observable.

📖

termer

No-action Imitation

A form of imitation learning where the agent must learn to reproduce expert behavior without any information about the actions taken.

📖

termer

Passive Learning

Learning mode where the agent passively observes demonstrations without active interaction with the environment, typical of imitation by observation.

📖

termer

Expert Demonstration

Set of trajectories or states provided by an expert serving as reference for imitation learning, crucial in approaches without access to actions.

📖

termer

State-Action Distribution

Joint distribution of states and actions that the agent seeks to approximate, often inferred from the state distribution alone in imitation by observation.

📖

termer

Trajectory-based Learning

Learning approach that focuses on reproducing complete trajectories rather than individual state-action decisions, adapted to observation without actions.

📖

termer

Dynamics Model

Model learning the transition between consecutive states in expert demonstrations, essential for inferring actions when they are not observed.

📖

termer

Occupancy Measure

Statistical measure quantifying the visitation frequency of each state-action, adapted to contexts where only state visitations are observable.

AI-ordlista

Inverse Reinforcement Learning

State-only Imitation Learning

Trajectory Matching

GAIL

Dataset Aggregation

Forward-Forward Algorithm

Observation-based Learning

State Distribution Matching

No-action Imitation

Passive Learning

Expert Demonstration

State-Action Distribution

Trajectory-based Learning

Dynamics Model

Occupancy Measure

Inga resultat hittades