🏠 Ana Sayfa
Benchmarklar
📊 Tüm Benchmarklar 🦖 Dinozor v1 🦖 Dinozor v2 ✅ To-Do List Uygulamaları 🎨 Yaratıcı Serbest Sayfalar 🎯 FSACB - Nihai Gösteri 🌍 Çeviri Benchmarkı
Modeller
🏆 En İyi 10 Model 🆓 Ücretsiz Modeller 📋 Tüm Modeller ⚙️ Kilo Code
Kaynaklar
💬 Prompt Kütüphanesi 📖 YZ Sözlüğü 🔗 Faydalı Bağlantılar

YZ Sözlüğü

Yapay Zekanın tam sözlüğü

162
kategoriler
2.032
alt kategoriler
23.060
terimler
📖
terimler

Sequence Parallelism

A form of parallelism that divides the sequence dimension of input tensors across multiple accelerators, used for Transformer-type models with long sequences.

📖
terimler

Expert Parallelism

A technique specific to dense mixture-of-experts (MoE) models where different expert networks are distributed across separate accelerators to balance the computational load.

📖
terimler

Sharded Data Parallelism

A combination of data parallelism and the ZeRO strategy, where model weights are partitioned (sharded) among workers while maintaining data parallelism.

📖
terimler

Activation Checkpointing

A memory technique that involves not storing intermediate activations during the forward pass, but recalculating them during the backward pass to save GPU memory.

📖
terimler

Hybrid Parallelism

An approach combining multiple parallelism strategies (e.g., tensor, pipeline, and data) to maximize resource utilization and scale training across thousands of accelerators.

📖
terimler

All-Reduce Communication

A collective communication operation essential to data parallelism, where local gradients from each accelerator are aggregated and redistributed to synchronize model weights.

📖
terimler

Tensor Slicing

A fundamental operation in tensor parallelism involving dividing a tensor along a specific dimension (e.g., row, column) to distribute it across multiple devices.

📖
terimler

GPipe

A pipeline parallelism implementation that uses micro-batching and activation checkpointing to efficiently train very large neural networks.

📖
terimler

Megatron-LM

Tensor parallelism architecture developed by NVIDIA, designed to train massive language models by partitioning weight matrices and gradients.

📖
terimler

DeepSpeed

Microsoft's optimization library implementing advanced techniques like ZeRO, hybrid parallelism, and memory compression for large-scale model training.

📖
terimler

Offloading

Memory management strategy where data (weights, gradients, activations) are dynamically moved between fast GPU memory and slower but more extensive CPU memory.

🔍

Sonuç bulunamadı