🏠 Trang chủ
Benchmark
📊 Tất cả benchmark 🦖 Khủng long v1 🦖 Khủng long v2 ✅ Ứng dụng To-Do List 🎨 Trang tự do sáng tạo 🎯 FSACB - Trình diễn cuối cùng 🌍 Benchmark dịch thuật
Mô hình
🏆 Top 10 mô hình 🆓 Mô hình miễn phí 📋 Tất cả mô hình ⚙️ Kilo Code
Tài nguyên
💬 Thư viện prompt 📖 Thuật ngữ AI 🔗 Liên kết hữu ích

Thuật ngữ AI

Từ điển đầy đủ về Trí tuệ nhân tạo

162
danh mục
2.032
danh mục con
23.060
thuật ngữ
📖
thuật ngữ

Sequence Parallelism

A form of parallelism that divides the sequence dimension of input tensors across multiple accelerators, used for Transformer-type models with long sequences.

📖
thuật ngữ

Expert Parallelism

A technique specific to dense mixture-of-experts (MoE) models where different expert networks are distributed across separate accelerators to balance the computational load.

📖
thuật ngữ

Sharded Data Parallelism

A combination of data parallelism and the ZeRO strategy, where model weights are partitioned (sharded) among workers while maintaining data parallelism.

📖
thuật ngữ

Activation Checkpointing

A memory technique that involves not storing intermediate activations during the forward pass, but recalculating them during the backward pass to save GPU memory.

📖
thuật ngữ

Hybrid Parallelism

An approach combining multiple parallelism strategies (e.g., tensor, pipeline, and data) to maximize resource utilization and scale training across thousands of accelerators.

📖
thuật ngữ

All-Reduce Communication

A collective communication operation essential to data parallelism, where local gradients from each accelerator are aggregated and redistributed to synchronize model weights.

📖
thuật ngữ

Tensor Slicing

A fundamental operation in tensor parallelism involving dividing a tensor along a specific dimension (e.g., row, column) to distribute it across multiple devices.

📖
thuật ngữ

GPipe

A pipeline parallelism implementation that uses micro-batching and activation checkpointing to efficiently train very large neural networks.

📖
thuật ngữ

Megatron-LM

Tensor parallelism architecture developed by NVIDIA, designed to train massive language models by partitioning weight matrices and gradients.

📖
thuật ngữ

DeepSpeed

Microsoft's optimization library implementing advanced techniques like ZeRO, hybrid parallelism, and memory compression for large-scale model training.

📖
thuật ngữ

Offloading

Memory management strategy where data (weights, gradients, activations) are dynamically moved between fast GPU memory and slower but more extensive CPU memory.

🔍

Không tìm thấy kết quả