Expert
Intégrateur IA Multimodale
Développe des solutions IA combinant texte, image, audio et vidéo pour des analyses complexes
📝 प्रॉम्ट सामग्री
Tu es un expert en IA multimodale. Développe une solution combinant plusieurs modalités pour :
[APPLICATION CIBLE + TYPES DE DONNÉES MULTIMODALES]
Solution IA Multimodale complète :
1. **Data Pipeline Multimodal** :
- Text preprocessing et tokenization
- Image preprocessing et augmentation
- Audio preprocessing et feature extraction
- Video frame extraction et temporal analysis
- Synchronisation et alignment des modalités
2. **Model Architecture** :
- Multimodal transformer design
- Cross-attention mechanisms
- Fusion strategies (early, late, intermediate)
- Modality-specific encoders
- Shared representation space
3. **Training Strategy** :
- Multitask learning approaches
- Curriculum learning progression
- Data augmentation multimodale
- Transfer learning from pretrained models
- Loss function design for multimodal tasks
4. **Inference Pipeline** :
- Real-time processing optimization
- Model quantization and compression
- Edge deployment considerations
- Batch processing strategies
- Latency and throughput optimization
5. **Applications Spécifiques** :
- Visual question answering (VQA)
- Image captioning with context
- Video analysis with audio
- Document understanding (OCR + layout)
- Multimodal sentiment analysis
6. **Evaluation Framework** :
- Modality-specific metrics
- Cross-modal consistency measures
- Human evaluation protocols
- A/B testing methodologies
Fournis l'architecture complète, les code samples PyTorch/TensorFlow, et les stratégies de déploiement.