🏠 Ana Sayfa
Benchmarklar
📊 Tüm Benchmarklar 🦖 Dinozor v1 🦖 Dinozor v2 ✅ To-Do List Uygulamaları 🎨 Yaratıcı Serbest Sayfalar 🎯 FSACB - Nihai Gösteri 🌍 Çeviri Benchmarkı
Modeller
🏆 En İyi 10 Model 🆓 Ücretsiz Modeller 📋 Tüm Modeller ⚙️ Kilo Code
Kaynaklar
💬 Prompt Kütüphanesi 📖 YZ Sözlüğü 🔗 Faydalı Bağlantılar

YZ Sözlüğü

Yapay Zekanın tam sözlüğü

162
kategoriler
2.032
alt kategoriler
23.060
terimler
📖
terimler

Model-Based Imitation Learning

Approach where the agent first learns a dynamic model of the environment, then uses this model to plan and generalize behaviors imitated from expert demonstrations.

📖
terimler

Dynamic Model

Mathematical representation learning the state transition of the environment, i.e., the probability P(s'|s, a) of reaching a new state s' by taking action a from state s.

📖
terimler

Counterfactual Reasoning Inference

Method of inferring the expert's reward function by comparing demonstrated trajectories with close counterfactual trajectories to identify the expert's preferences.

📖
terimler

Model-Based Planning

Process of using the learned dynamic and reward models to simulate different action sequences and select the optimal policy without direct interaction with the real environment.

📖
terimler

Model-Based Generalization

Ability of a model-based agent to adapt imitated behaviors to new situations not seen in the demonstrations by simulating hypothetical scenarios using its environment model.

📖
terimler

Inverse Reinforcement Learning (IRL)

Process of inferring the underlying reward function of an expert from their demonstrations, providing a dense signal for training the reinforcement learning agent.

📖
terimler

Backpropagation Through Time (BPTT)

Algorithm used to train recurrent dynamic models, where the loss gradients are calculated by backpropagating errors through the time steps of the simulated trajectory.

📖
terimler

Trajectory Optimization

Class of planning algorithms that iteratively improve an entire trajectory using gradients from the reward and dynamic models, as opposed to value-based methods.

📖
terimler

Behavioral Imitation Learning (BC)

Supervised learning approach that directly models the expert policy π(a|s) by minimizing the error between the agent's actions and the expert's actions for given states.

📖
terimler

Hybrid BC-Model-Based

Architecture combining a behavioral model for direct imitation and an environment model for planning, where both contributions are merged to produce the agent's final action.

🔍

Sonuç bulunamadı