HPC-AI Systems Architecture

📖

Begriffe

GPU-Accelerated Computing

Parallel computing architecture using graphics processors (GPUs) to accelerate the execution of compute-intensive applications, particularly AI model training.

📖

Begriffe

PCIe (Peripheral Component Interconnect Express)

High-speed standard bus for connecting internal computer components, including GPUs, network cards, and SSDs, to the CPU.

📖

Begriffe

InfiniBand

Very high-performance, low-latency computer networking standard, primarily used in HPC clusters for interconnecting compute nodes.

📖

Begriffe

RDMA (Remote Direct Memory Access)

Technology allowing one computer to directly access another computer's memory over the network, without going through either operating system, thereby reducing latency and CPU load.

📖

Begriffe

NUMA (Non-Uniform Memory Access)

Multiprocessor memory architecture where memory access time depends on the memory location relative to the processor, a crucial factor for optimizing HPC applications.

📖

Begriffe

Fabric Interconnect

High-speed, low-latency interconnection network that links compute nodes, GPUs, and storage systems within an HPC cluster, forming a unified communication 'fabric'.

📖

Begriffe

DGX (Deep Learning eXchange)

Reference integrated AI system, designed by NVIDIA, combining GPUs interconnected via NVLink, an optimized architecture, and comprehensive software to accelerate AI development and deployment.

📖

Begriffe

RoCE (RDMA over Converged Ethernet)

Protocol enabling RDMA technology implementation over standard Ethernet networks, providing an alternative to InfiniBand for low-latency, high-bandwidth interconnects.

📖

Begriffe

Slurm (Simple Linux Utility for Resource Management)

Open source resource manager and job scheduler, widely used in HPC clusters, allocating CPU/GPU/memory resources and scheduling the execution of AI workloads.

📖

Begriffe

GPUDirect

Set of NVIDIA technologies allowing a GPU to directly access the memory of other devices (other GPUs, NICs, SSDs) without going through CPU memory, reducing latency and data copying.

📖

Begriffe

SHARP (Scalable Hierarchical Aggregation and Reduction Protocol)

Network acceleration technology that offloads data collection and reduction operations (e.g., all-reduce in distributed training) to network equipment, freeing up CPU/GPU resources.

📖

Begriffe

Node-to-Node Latency

Measurement of the delay time for transmitting a data packet between two distinct compute nodes in a cluster, a key performance indicator for distributed AI algorithms.

📖

Begriffe

Measurement of the total bandwidth of an interconnection network, calculated as the sum of links that must be cut to divide the network into two equal parts, indicating the overall communication capacity of the cluster.

📖

Begriffe

Cache Coherency

Mechanism ensuring that all copies of a data block in different caches of a multiprocessor system are identical, essential for data consistency in multi-CPU/GPU HPC architectures.

KI-Glossar

GPU-Accelerated Computing

PCIe (Peripheral Component Interconnect Express)

InfiniBand

RDMA (Remote Direct Memory Access)

NUMA (Non-Uniform Memory Access)

Fabric Interconnect

DGX (Deep Learning eXchange)

RoCE (RDMA over Converged Ethernet)

Slurm (Simple Linux Utility for Resource Management)

GPUDirect

SHARP (Scalable Hierarchical Aggregation and Reduction Protocol)

Node-to-Node Latency

Bisection Bandwidth

Cache Coherency

Keine Ergebnisse gefunden