Multi-Modal Transformers
Flamingo Model
80-billion parameter vision-language model using pre-trained adapters and attentional gating to effectively combine Vision Transformers and language models without full retraining.
← Zurück