KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
CUDA
Parallel computing architecture and programming interface created by NVIDIA, allowing developers to use GPUs for general-purpose computing through extensions to the C/C++ language.
Tensor Core
Specialized computing units integrated into modern NVIDIA GPUs, designed to exponentially accelerate matrix multiplication and addition operations, fundamental for deep neural networks.
ROCm
Open source computing platform for AMD GPUs, offering a complete ecosystem of programming languages (HIP), libraries (MIOpen), and tools for high-performance computing and AI.
OpenCL
Open standard for writing programs that run on heterogeneous platforms, including CPUs, GPUs, and other processors, defining a language based on C99 and APIs for device management.
cuDNN
GPU-accelerated library of primitives for deep neural networks, developed by NVIDIA, providing highly optimized implementations for convolution, pooling, and normalization routines.
Memory Bandwidth
Maximum data transfer rate between the GPU and its video memory (VRAM), measured in GB/s, constituting a critical factor for the performance of intensive computations and training of large AI models.
Kernel
Main function executed on the GPU in a parallel computing program, launched on a grid of threads and designed to process a specific portion of data simultaneously.
Warp
Group of 32 threads executed in SIMT (Single Instruction, Multiple Thread) mode on NVIDIA GPUs, sharing the same instruction stream and constituting the basic scheduling unit for parallel execution.
Stream Multiprocessor (SM)
Basic computing unit on an NVIDIA GPU, containing cores, shared memory units, and schedulers, capable of simultaneously executing multiple thread blocks and managing their execution.
Shared Memory
Fast, low-latency memory space shared among threads within the same block on a GPU, enabling collaboration and reducing accesses to the much slower global memory.
Unified Memory
Memory management technology that creates a single address space between the CPU and GPU, eliminating the need for explicit data copies and simplifying the development of heterogeneous applications.
NVLink
High-bandwidth interconnect technology developed by NVIDIA, enabling direct and fast communication between multiple GPUs, surpassing the limitations of the PCIe bus for distributed computing.
FP16 (Half-Precision)
16-bit floating-point number format used to accelerate computations and reduce memory footprint in neural networks, at the cost of a slight precision loss that is often acceptable.
CUDA Graphs
Technology that allows capturing an entire sequence of CUDA operations in a graph, then re-executing it with minimal overhead, reducing kernel launch costs for repetitive workloads.
HIP
Programming API and compilation language developed by AMD, designed as a portable alternative to CUDA, enabling easier migration of CUDA code to AMD GPUs.
MIOpen
Optimization library for deep neural networks on AMD's ROCm platform, providing high-performance implementations for convolution, pooling, and normalization layers.
Compute Capability
Version number describing the characteristics and features of an NVIDIA GPU, including the number of cores, architecture, supported instructions, and computing capabilities, essential for software compatibility.
Coalesced Memory Access
Memory access optimization where adjacent threads in a warp access contiguous memory locations, allowing these requests to be combined into a single wide and efficient memory transaction.