🏠 Trang chủ
Benchmark
📊 Tất cả benchmark 🦖 Khủng long v1 🦖 Khủng long v2 ✅ Ứng dụng To-Do List 🎨 Trang tự do sáng tạo 🎯 FSACB - Trình diễn cuối cùng 🌍 Benchmark dịch thuật
Mô hình
🏆 Top 10 mô hình 🆓 Mô hình miễn phí 📋 Tất cả mô hình ⚙️ Kilo Code
Tài nguyên
💬 Thư viện prompt 📖 Thuật ngữ AI 🔗 Liên kết hữu ích

Thuật ngữ AI

Từ điển đầy đủ về Trí tuệ nhân tạo

162
danh mục
2.032
danh mục con
23.060
thuật ngữ
📖
thuật ngữ

Decision Transformer

Transformer architecture that models offline reinforcement learning as a sequence-to-sequence problem, predicting future actions based on past states and cumulative returns.

📖
thuật ngữ

Trajectory Modeling

Approach involving modeling complete trajectories (states, actions, rewards) as continuous sequences for policy learning in offline RL.

📖
thuật ngữ

GPT-like Architecture

Neural network structure based on the transformer decoder with causal attention, adapted for autoregressive prediction in sequence tasks.

📖
thuật ngữ

Policy Extraction

Process of deriving a decision policy from a trained sequence model, where the transformer generates actions conditioned on states and desired returns.

📖
thuật ngữ

Action Prediction

Main task of the Decision Transformer consisting of predicting the optimal action at step t+1 given state t and the desired return-to-come.

📖
thuật ngữ

State Representation

Vector encoding of the environment state integrated into the transformer's input sequence, capturing relevant information for decision-making.

📖
thuật ngữ

Trajectory Transformer

Variant of the Decision Transformer explicitly modeling the joint distribution over complete trajectories to generate consistent action sequences.

📖
thuật ngữ

Context Length

Maximum number of tokens (states, actions, rewards) that the transformer can process simultaneously within its attention window.

📖
thuật ngữ

Transformer Decoder

Main component of the Decision Transformer using masked attention to sequentially generate future actions.

📖
thuật ngữ

Sequence Conditioning

Strategy where future predictions are conditioned by the complete sequence of past events rather than a single current state.

📖
thuật ngữ

Offline Dataset

Static dataset containing trajectories (states, actions, rewards) collected by a behavioral policy, used for offline training.

🔍

Không tìm thấy kết quả