AI-ordlista
Den kompletta ordlistan över AI
PCIe Bandwidth
Maximum data transfer rate through the PCIe bus, crucial for communication speed between CPU and GPU in AI workloads.
NVMe over Fabrics
Protocol allowing access to NVMe storage over a network, reducing latency for massive datasets in AI.
GPUDirect Storage
NVIDIA technology enabling direct data transfer from storage to GPU memory, bypassing the CPU and RAM.
Memory Pinning
Process of locking memory pages in RAM to ensure continuous and fast DMA (Direct Memory Access) by the GPU.
Zero-Copy
Optimization technique where data is transferred directly between devices without intermediate copying in CPU memory.
Tensor Core Throughput
Computing capacity of GPU Tensor Cores, often limited by the data feeding speed from memory.
Data Pipeline Parallelism
Strategy where data loading, preprocessing, and transfer execute in parallel with GPU computation to hide latencies.
Prefetching
Loading data into GPU memory in advance before it is needed by computation, to keep the GPU busy.
Host-to-Device Latency
Time required to initiate and complete a data transfer from the CPU (host) to the GPU (device).
CUDA Stream
Sequence of operations executed on the GPU in a specific order, allowing transfers and computations to be concurrent.
NUMA Awareness
Optimization of memory allocations to respect the NUMA topology of multi-CPU servers, reducing access latencies.
GPUDirect RDMA
Technology enabling direct data transfer between GPU memory of different nodes via RDMA, without CPU copying.
Asynchronous Data Transfer
Data transfer executed in parallel with GPU computations, using CUDA streams to hide latencies.
Page-Locked Memory
Non-pageable system memory, required for high-bandwidth asynchronous DMA transfers to the GPU.