Multi-Modal Transformers
GIT
Generative Image-to-text Transformer model that treats images as a foreign language and uses a simple encoder-decoder architecture for image description and VQA with state-of-the-art performance.
← Terug