GPU Computing for AI

📖

Begriffe

CUDA

Parallel computing architecture and programming interface created by NVIDIA, allowing developers to use GPUs for general-purpose computing through extensions to the C/C++ language.

📖

Begriffe

Tensor Core

Specialized computing units integrated into modern NVIDIA GPUs, designed to exponentially accelerate matrix multiplication and addition operations, fundamental for deep neural networks.

📖

Begriffe

ROCm

Open source computing platform for AMD GPUs, offering a complete ecosystem of programming languages (HIP), libraries (MIOpen), and tools for high-performance computing and AI.

📖

Begriffe

OpenCL

Open standard for writing programs that run on heterogeneous platforms, including CPUs, GPUs, and other processors, defining a language based on C99 and APIs for device management.

📖

Begriffe

cuDNN

GPU-accelerated library of primitives for deep neural networks, developed by NVIDIA, providing highly optimized implementations for convolution, pooling, and normalization routines.

📖

Begriffe

Memory Bandwidth

Maximum data transfer rate between the GPU and its video memory (VRAM), measured in GB/s, constituting a critical factor for the performance of intensive computations and training of large AI models.

📖

Begriffe

Kernel

Main function executed on the GPU in a parallel computing program, launched on a grid of threads and designed to process a specific portion of data simultaneously.

📖

Begriffe

Warp

Group of 32 threads executed in SIMT (Single Instruction, Multiple Thread) mode on NVIDIA GPUs, sharing the same instruction stream and constituting the basic scheduling unit for parallel execution.

📖

Begriffe

Stream Multiprocessor (SM)

Basic computing unit on an NVIDIA GPU, containing cores, shared memory units, and schedulers, capable of simultaneously executing multiple thread blocks and managing their execution.

📖

Begriffe

Shared Memory

Fast, low-latency memory space shared among threads within the same block on a GPU, enabling collaboration and reducing accesses to the much slower global memory.

📖

Begriffe

Unified Memory

Memory management technology that creates a single address space between the CPU and GPU, eliminating the need for explicit data copies and simplifying the development of heterogeneous applications.

📖

Begriffe

NVLink

High-bandwidth interconnect technology developed by NVIDIA, enabling direct and fast communication between multiple GPUs, surpassing the limitations of the PCIe bus for distributed computing.

📖

Begriffe

FP16 (Half-Precision)

16-bit floating-point number format used to accelerate computations and reduce memory footprint in neural networks, at the cost of a slight precision loss that is often acceptable.

📖

Begriffe

CUDA Graphs

Technology that allows capturing an entire sequence of CUDA operations in a graph, then re-executing it with minimal overhead, reducing kernel launch costs for repetitive workloads.

📖

Begriffe

HIP

Programming API and compilation language developed by AMD, designed as a portable alternative to CUDA, enabling easier migration of CUDA code to AMD GPUs.

📖

Begriffe

MIOpen

Optimization library for deep neural networks on AMD's ROCm platform, providing high-performance implementations for convolution, pooling, and normalization layers.

📖

Begriffe

Version number describing the characteristics and features of an NVIDIA GPU, including the number of cores, architecture, supported instructions, and computing capabilities, essential for software compatibility.

📖

Begriffe

Coalesced Memory Access

Memory access optimization where adjacent threads in a warp access contiguous memory locations, allowing these requests to be combined into a single wide and efficient memory transaction.

KI-Glossar

CUDA

Tensor Core

ROCm

OpenCL

cuDNN

Memory Bandwidth

Kernel

Warp

Stream Multiprocessor (SM)

Shared Memory

Unified Memory

NVLink

FP16 (Half-Precision)

CUDA Graphs

HIP

MIOpen

Compute Capability

Coalesced Memory Access

Keine Ergebnisse gefunden