Thuật ngữ AI
Từ điển đầy đủ về Trí tuệ nhân tạo
Decision Transformer
Transformer architecture that models offline reinforcement learning as a sequence-to-sequence problem, predicting future actions based on past states and cumulative returns.
Trajectory Modeling
Approach involving modeling complete trajectories (states, actions, rewards) as continuous sequences for policy learning in offline RL.
GPT-like Architecture
Neural network structure based on the transformer decoder with causal attention, adapted for autoregressive prediction in sequence tasks.
Policy Extraction
Process of deriving a decision policy from a trained sequence model, where the transformer generates actions conditioned on states and desired returns.
Action Prediction
Main task of the Decision Transformer consisting of predicting the optimal action at step t+1 given state t and the desired return-to-come.
State Representation
Vector encoding of the environment state integrated into the transformer's input sequence, capturing relevant information for decision-making.
Trajectory Transformer
Variant of the Decision Transformer explicitly modeling the joint distribution over complete trajectories to generate consistent action sequences.
Context Length
Maximum number of tokens (states, actions, rewards) that the transformer can process simultaneously within its attention window.
Transformer Decoder
Main component of the Decision Transformer using masked attention to sequentially generate future actions.
Sequence Conditioning
Strategy where future predictions are conditioned by the complete sequence of past events rather than a single current state.
Offline Dataset
Static dataset containing trajectories (states, actions, rewards) collected by a behavioral policy, used for offline training.