Optimization of Data Transfers

📖

termer

PCIe Bandwidth

Maximum data transfer rate through the PCIe bus, crucial for communication speed between CPU and GPU in AI workloads.

📖

termer

NVMe over Fabrics

Protocol allowing access to NVMe storage over a network, reducing latency for massive datasets in AI.

📖

termer

GPUDirect Storage

NVIDIA technology enabling direct data transfer from storage to GPU memory, bypassing the CPU and RAM.

📖

termer

Memory Pinning

Process of locking memory pages in RAM to ensure continuous and fast DMA (Direct Memory Access) by the GPU.

📖

termer

Zero-Copy

Optimization technique where data is transferred directly between devices without intermediate copying in CPU memory.

📖

termer

Tensor Core Throughput

Computing capacity of GPU Tensor Cores, often limited by the data feeding speed from memory.

📖

termer

Data Pipeline Parallelism

Strategy where data loading, preprocessing, and transfer execute in parallel with GPU computation to hide latencies.

📖

termer

Prefetching

Loading data into GPU memory in advance before it is needed by computation, to keep the GPU busy.

📖

termer

Host-to-Device Latency

Time required to initiate and complete a data transfer from the CPU (host) to the GPU (device).

📖

termer

CUDA Stream

Sequence of operations executed on the GPU in a specific order, allowing transfers and computations to be concurrent.

📖

termer

NUMA Awareness

Optimization of memory allocations to respect the NUMA topology of multi-CPU servers, reducing access latencies.

📖

termer

GPUDirect RDMA

Technology enabling direct data transfer between GPU memory of different nodes via RDMA, without CPU copying.

📖

termer

Asynchronous Data Transfer

Data transfer executed in parallel with GPU computations, using CUDA streams to hide latencies.

📖

termer

Page-Locked Memory

Non-pageable system memory, required for high-bandwidth asynchronous DMA transfers to the GPU.

AI-ordlista

PCIe Bandwidth

NVMe over Fabrics

GPUDirect Storage

Memory Pinning

Zero-Copy

Tensor Core Throughput

Data Pipeline Parallelism

Prefetching

Host-to-Device Latency

CUDA Stream

NUMA Awareness

GPUDirect RDMA

Asynchronous Data Transfer

Page-Locked Memory

Inga resultat hittades