🏠 Beranda
Benchmark
📊 Semua Benchmark 🦖 Dinosaurus v1 🦖 Dinosaurus v2 ✅ Aplikasi To-Do List 🎨 Halaman Bebas Kreatif 🎯 FSACB - Showcase Utama 🌍 Benchmark Terjemahan
Model
🏆 Top 10 Model 🆓 Model Gratis 📋 Semua Model ⚙️ Kilo Code
Sumber Daya
💬 Perpustakaan Prompt 📖 Glosarium AI 🔗 Tautan Berguna

Glosarium AI

Kamus lengkap Kecerdasan Buatan

162
kategori
2.032
subkategori
23.060
istilah
📖
istilah

Model-Based Inverse Reinforcement Learning

An approach that infers a reward function from expert demonstrations using an environment model to generate and evaluate plausible alternative trajectories.

📖
istilah

Reward Function Inference

The process of estimating an agent's underlying reward function by observing its behavior, often formulated as a likelihood optimization problem of demonstrated trajectories.

📖
istilah

Environmental Dynamics Model

A learned model that predicts the next state and reward given a current state and action, used to simulate trajectories in model-based reinforcement learning.

📖
istilah

Plausible Trajectory Generation

The use of an environment model to create state-action sequences that are consistent with system dynamics and observed policies, serving as synthetic data for inference.

📖
istilah

Imitation Bias

Tendency of an agent learned through inverse reinforcement learning to over-imitate demonstrated actions without generalizing to unseen states, requiring the use of models to explore beyond expert data.

📖
istilah

Trajectory Likelihood Optimization

Method of adjusting the reward function to maximize the probability that observed expert trajectories are optimal under the inferred reward.

📖
istilah

Ambiguous Reward Function

Problem where multiple different reward functions can equally explain the same expert demonstrations, requiring constraints or priors to resolve the ambiguity.

📖
istilah

Synthetic Trajectory Set

Collection of trajectories generated by the environment model, used to enrich demonstration data and improve the robustness of reward inference.

📖
istilah

Environmental Model Error

Discrepancy between the actual environment dynamics and those predicted by the learned model, which can bias reward inference if not corrected.

📖
istilah

Backpropagation through Model

Technique for computing gradients of the reward function with respect to its parameters by propagating the error through the differentiable dynamics model.

📖
istilah

Policy Space

Set of all possible policies π(a|s) that the agent can adopt, in which inverse reinforcement learning seeks to identify the optimal policy compatible with demonstrations.

📖
istilah

Model-Based Monte Carlo Planning

Method using stochastic simulations of the environmental model to evaluate different candidate reward functions and select the one that best explains demonstrations.

📖
istilah

Regularization Cost Function

Term added to the inference objective to penalize complex or unrealistic reward functions, favoring simpler and more generalizable solutions.

📖
istilah

Posterior Distribution over Rewards

Bayesian approach that maintains a probability distribution over possible reward functions rather than a point estimate, allowing to quantify uncertainty.

📖
istilah

Simulation Horizon

Maximum number of future steps simulated by the environmental model when generating trajectories, influencing the balance between exploration and computational cost.

📖
istilah

Model-Based Importance Sampling

Technique using the model to generate trajectories from a proposal distribution, then weighting them by their likelihood under the expert policy.

📖
istilah

Maximum Entropy Method

Inference principle that chooses the least informative (maximum entropy) reward function among those that explain the demonstrations, avoiding overfitting.

🔍

Tidak ada hasil ditemukan