Multimodal Transformers
Flamingo
Vision-language model that adapts pre-trained transformers with visual-linguistic attention modules. Enables few-shot learning on complex multimodal understanding tasks without full retraining.
← 뒤로