Słownik AI

Kompletny słownik sztucznej inteligencji

162

kategorie

2 032

podkategorie

23 060

pojęcia

📖

pojęcia

Token Fusion

Technique of concatenating or fusing tokens from different modalities before processing them through transformer layers. Enables early integration of multimodal information for better joint representation.

📖

pojęcia

ALIGN

Contrastive image-text model trained on one billion automatically filtered noisy pairs. Demonstrates that data quantity can compensate for noise in large-scale multimodal learning.

📖

pojęcia

Flamingo

Vision-language model that adapts pre-trained transformers with visual-linguistic attention modules. Enables few-shot learning on complex multimodal understanding tasks without full retraining.

📖

pojęcia

Cross-Modal Representation

Shared vector space where embeddings from different modalities are semantically aligned to enable cross-modal interactions. Facilitates knowledge transfer and unified understanding between text, images, audio, and video.

📖

pojęcia

MViT (Multiscale Vision Transformer)

Video transformer architecture that combines features at multiple temporal and spatial scales. Uses pyramid attention to efficiently capture long-range relationships in video sequences.

📖

pojęcia

Multi-Head Cross Attention

Extension of the multi-head mechanism where each head learns different cross-modal correspondences between modalities. Allows richer and more diverse capture of inter-modal relationships in multimodal transformer architectures.

🔍

Słownik AI

Token Fusion

ALIGN

Flamingo

Cross-Modal Representation

MViT (Multiscale Vision Transformer)

Multi-Head Cross Attention

Nie znaleziono wyników