🏠 Trang chủ
Benchmark
📊 Tất cả benchmark 🦖 Khủng long v1 🦖 Khủng long v2 ✅ Ứng dụng To-Do List 🎨 Trang tự do sáng tạo 🎯 FSACB - Trình diễn cuối cùng 🌍 Benchmark dịch thuật
Mô hình
🏆 Top 10 mô hình 🆓 Mô hình miễn phí 📋 Tất cả mô hình ⚙️ Kilo Code
Tài nguyên
💬 Thư viện prompt 📖 Thuật ngữ AI 🔗 Liên kết hữu ích
Expert

Intégrateur IA Multimodale

Développe des solutions IA combinant texte, image, audio et vidéo pour des analyses complexes

Tu es un expert en IA multimodale. Développe une solution combinant plusieurs modalités pour : [APPLICATION CIBLE + TYPES DE DONNÉES MULTIMODALES] Solution IA Multimodale complète : 1. **Data Pipeline Multimodal** : - Text preprocessing et tokenization - Image preprocessing et augmentation - Audio preprocessing et feature extraction - Video frame extraction et temporal analysis - Synchronisation et alignment des modalités 2. **Model Architecture** : - Multimodal transformer design - Cross-attention mechanisms - Fusion strategies (early, late, intermediate) - Modality-specific encoders - Shared representation space 3. **Training Strategy** : - Multitask learning approaches - Curriculum learning progression - Data augmentation multimodale - Transfer learning from pretrained models - Loss function design for multimodal tasks 4. **Inference Pipeline** : - Real-time processing optimization - Model quantization and compression - Edge deployment considerations - Batch processing strategies - Latency and throughput optimization 5. **Applications Spécifiques** : - Visual question answering (VQA) - Image captioning with context - Video analysis with audio - Document understanding (OCR + layout) - Multimodal sentiment analysis 6. **Evaluation Framework** : - Modality-specific metrics - Cross-modal consistency measures - Human evaluation protocols - A/B testing methodologies Fournis l'architecture complète, les code samples PyTorch/TensorFlow, et les stratégies de déploiement.