KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
GPU-Accelerated Computing
Parallel computing architecture using graphics processors (GPUs) to accelerate the execution of compute-intensive applications, particularly AI model training.
PCIe (Peripheral Component Interconnect Express)
High-speed standard bus for connecting internal computer components, including GPUs, network cards, and SSDs, to the CPU.
InfiniBand
Very high-performance, low-latency computer networking standard, primarily used in HPC clusters for interconnecting compute nodes.
RDMA (Remote Direct Memory Access)
Technology allowing one computer to directly access another computer's memory over the network, without going through either operating system, thereby reducing latency and CPU load.
NUMA (Non-Uniform Memory Access)
Multiprocessor memory architecture where memory access time depends on the memory location relative to the processor, a crucial factor for optimizing HPC applications.
Fabric Interconnect
High-speed, low-latency interconnection network that links compute nodes, GPUs, and storage systems within an HPC cluster, forming a unified communication 'fabric'.
DGX (Deep Learning eXchange)
Reference integrated AI system, designed by NVIDIA, combining GPUs interconnected via NVLink, an optimized architecture, and comprehensive software to accelerate AI development and deployment.
RoCE (RDMA over Converged Ethernet)
Protocol enabling RDMA technology implementation over standard Ethernet networks, providing an alternative to InfiniBand for low-latency, high-bandwidth interconnects.
Slurm (Simple Linux Utility for Resource Management)
Open source resource manager and job scheduler, widely used in HPC clusters, allocating CPU/GPU/memory resources and scheduling the execution of AI workloads.
GPUDirect
Set of NVIDIA technologies allowing a GPU to directly access the memory of other devices (other GPUs, NICs, SSDs) without going through CPU memory, reducing latency and data copying.
SHARP (Scalable Hierarchical Aggregation and Reduction Protocol)
Network acceleration technology that offloads data collection and reduction operations (e.g., all-reduce in distributed training) to network equipment, freeing up CPU/GPU resources.
Node-to-Node Latency
Measurement of the delay time for transmitting a data packet between two distinct compute nodes in a cluster, a key performance indicator for distributed AI algorithms.
Bisection Bandwidth
Measurement of the total bandwidth of an interconnection network, calculated as the sum of links that must be cut to divide the network into two equal parts, indicating the overall communication capacity of the cluster.
Cache Coherency
Mechanism ensuring that all copies of a data block in different caches of a multiprocessor system are identical, essential for data consistency in multi-CPU/GPU HPC architectures.