🏠 Startseite
Vergleiche
📊 Alle Benchmarks 🦖 Dinosaurier v1 🦖 Dinosaurier v2 ✅ To-Do-Listen-Apps 🎨 Kreative freie Seiten 🎯 FSACB - Ultimatives Showcase 🌍 Übersetzungs-Benchmark
Modelle
🏆 Top 10 Modelle 🆓 Kostenlose Modelle 📋 Alle Modelle ⚙️ Kilo Code
Ressourcen
💬 Prompt-Bibliothek 📖 KI-Glossar 🔗 Nützliche Links

KI-Glossar

Das vollständige Wörterbuch der Künstlichen Intelligenz

162
Kategorien
2.032
Unterkategorien
23.060
Begriffe
📖
Begriffe

Model-Based Inverse Reinforcement Learning

An approach that infers a reward function from expert demonstrations using an environment model to generate and evaluate plausible alternative trajectories.

📖
Begriffe

Reward Function Inference

The process of estimating an agent's underlying reward function by observing its behavior, often formulated as a likelihood optimization problem of demonstrated trajectories.

📖
Begriffe

Environmental Dynamics Model

A learned model that predicts the next state and reward given a current state and action, used to simulate trajectories in model-based reinforcement learning.

📖
Begriffe

Plausible Trajectory Generation

The use of an environment model to create state-action sequences that are consistent with system dynamics and observed policies, serving as synthetic data for inference.

📖
Begriffe

Imitation Bias

Tendency of an agent learned through inverse reinforcement learning to over-imitate demonstrated actions without generalizing to unseen states, requiring the use of models to explore beyond expert data.

📖
Begriffe

Trajectory Likelihood Optimization

Method of adjusting the reward function to maximize the probability that observed expert trajectories are optimal under the inferred reward.

📖
Begriffe

Ambiguous Reward Function

Problem where multiple different reward functions can equally explain the same expert demonstrations, requiring constraints or priors to resolve the ambiguity.

📖
Begriffe

Synthetic Trajectory Set

Collection of trajectories generated by the environment model, used to enrich demonstration data and improve the robustness of reward inference.

📖
Begriffe

Environmental Model Error

Discrepancy between the actual environment dynamics and those predicted by the learned model, which can bias reward inference if not corrected.

📖
Begriffe

Backpropagation through Model

Technique for computing gradients of the reward function with respect to its parameters by propagating the error through the differentiable dynamics model.

📖
Begriffe

Policy Space

Set of all possible policies π(a|s) that the agent can adopt, in which inverse reinforcement learning seeks to identify the optimal policy compatible with demonstrations.

📖
Begriffe

Model-Based Monte Carlo Planning

Method using stochastic simulations of the environmental model to evaluate different candidate reward functions and select the one that best explains demonstrations.

📖
Begriffe

Regularization Cost Function

Term added to the inference objective to penalize complex or unrealistic reward functions, favoring simpler and more generalizable solutions.

📖
Begriffe

Posterior Distribution over Rewards

Bayesian approach that maintains a probability distribution over possible reward functions rather than a point estimate, allowing to quantify uncertainty.

📖
Begriffe

Simulation Horizon

Maximum number of future steps simulated by the environmental model when generating trajectories, influencing the balance between exploration and computational cost.

📖
Begriffe

Model-Based Importance Sampling

Technique using the model to generate trajectories from a proposal distribution, then weighting them by their likelihood under the expert policy.

📖
Begriffe

Maximum Entropy Method

Inference principle that chooses the least informative (maximum entropy) reward function among those that explain the demonstrations, avoiding overfitting.

🔍

Keine Ergebnisse gefunden