Model Parallelism - Glossariusz AI

📖

pojęcia

Sequence Parallelism

A form of parallelism that divides the sequence dimension of input tensors across multiple accelerators, used for Transformer-type models with long sequences.

📖

pojęcia

Expert Parallelism

A technique specific to dense mixture-of-experts (MoE) models where different expert networks are distributed across separate accelerators to balance the computational load.

📖

pojęcia

Sharded Data Parallelism

A combination of data parallelism and the ZeRO strategy, where model weights are partitioned (sharded) among workers while maintaining data parallelism.

📖

pojęcia

Activation Checkpointing

A memory technique that involves not storing intermediate activations during the forward pass, but recalculating them during the backward pass to save GPU memory.

📖

pojęcia

Hybrid Parallelism

An approach combining multiple parallelism strategies (e.g., tensor, pipeline, and data) to maximize resource utilization and scale training across thousands of accelerators.

📖

pojęcia

All-Reduce Communication

A collective communication operation essential to data parallelism, where local gradients from each accelerator are aggregated and redistributed to synchronize model weights.

📖

pojęcia

Tensor Slicing

A fundamental operation in tensor parallelism involving dividing a tensor along a specific dimension (e.g., row, column) to distribute it across multiple devices.

📖

pojęcia

GPipe

A pipeline parallelism implementation that uses micro-batching and activation checkpointing to efficiently train very large neural networks.

📖

pojęcia

Megatron-LM

Tensor parallelism architecture developed by NVIDIA, designed to train massive language models by partitioning weight matrices and gradients.

📖

pojęcia

DeepSpeed

Microsoft's optimization library implementing advanced techniques like ZeRO, hybrid parallelism, and memory compression for large-scale model training.

📖

pojęcia

Offloading

Memory management strategy where data (weights, gradients, activations) are dynamically moved between fast GPU memory and slower but more extensive CPU memory.

Słownik AI

Sequence Parallelism

Expert Parallelism

Sharded Data Parallelism

Activation Checkpointing

Hybrid Parallelism

All-Reduce Communication

Tensor Slicing

GPipe

Megatron-LM

DeepSpeed

Offloading

Nie znaleziono wyników