🏠 홈
벤치마크
📊 모든 벤치마크 🦖 공룡 v1 🦖 공룡 v2 ✅ 할 일 목록 앱 🎨 창의적인 자유 페이지 🎯 FSACB - 궁극의 쇼케이스 🌍 번역 벤치마크
모델
🏆 톱 10 모델 🆓 무료 모델 📋 모든 모델 ⚙️ 킬로 코드 모드
리소스
💬 프롬프트 라이브러리 📖 AI 용어 사전 🔗 유용한 링크

AI 용어집

인공지능 완전 사전

162
카테고리
2,032
하위 카테고리
23,060
용어
📖
용어

Offline imitation learning

Learning paradigm where the agent learns to imitate expert behaviors without interacting with the environment, using only a fixed set of pre-recorded demonstrations.

📖
용어

Demonstration set

Static collection of trajectories or expert action examples used as the sole source of information for offline imitation learning.

📖
용어

Offline reinforcement learning

Reinforcement learning approach that uses only a pre-existing dataset without real-time interaction with the environment.

📖
용어

Importance sampling

Statistical technique used to correct the discrepancy between the data distribution and target policy by weighting samples according to their relative probability.

📖
용어

Distribution preservation

Constraint imposed on the learned policy to remain close to the demonstration distribution, thus avoiding risky extrapolations in unknown regions.

📖
용어

Offline trajectory

Complete sequence of states, actions, and rewards recorded from an expert policy, constituting the basic unit of learning data.

📖
용어

Expert policy

Reference strategy that generated the demonstrations, serving as a model to imitate and defining the desired optimal behavior.

📖
용어

Offline estimator

Value or policy estimation algorithm specifically designed to work with static data without requiring interaction with the environment.

📖
용어

Conservative bias correction

Bias correction approach that prioritizes safety by penalizing under-represented actions in the demonstration data.

📖
용어

Constrained imitation learning

Method incorporating explicit constraints on the divergence between the learned policy and the data distribution to ensure stability.

📖
용어

Transition set

Data structure storing tuples (state, action, next state, reward) extracted from expert trajectories for offline training.

📖
용어

Adaptive importance weighting

Dynamic weighting technology that adjusts importance weights based on confidence in data quality in different regions of the state space.

📖
용어

Coverage error

Measure quantifying the mismatch between the support of the data distribution and that of the optimal policy in offline learning.

🔍

결과를 찾을 수 없습니다