BERT Architecture
Pooling Layer
Final layer that aggregates token representations into a single vector for classification tasks. BERT typically uses the [CLS] token representation or performs mean pooling over all tokens.
← Quay lại