AI-woordenlijst
Het complete woordenboek van kunstmatige intelligentie
Kernel
CUDA function executed on the GPU by a large number of threads simultaneously. The kernel is launched from the CPU and executed in parallel on the GPU device with a specific grid and block configuration.
Thread
Basic execution unit in CUDA, representing a single sequence of instructions executed on a GPU processor core. Threads are organized into blocks and execute the same code on different data.
Block
Collection of threads that can communicate with each other via shared memory and synchronize their execution. Blocks are organized into a grid and execute on the same Streaming Multiprocessor (SM).
Grid
Set of thread blocks that constitute the complete execution configuration of a CUDA kernel. The grid represents the highest hierarchical structure of thread organization in CUDA.
Warp
Group of 32 threads that execute simultaneously in SIMT (Single Instruction Multiple Thread) mode on a CUDA SM. All threads in a warp execute the same instruction at the same clock cycle.
Shared Memory
Fast, small-sized memory shared by all threads of the same block, enabling efficient communication between threads. Shared memory is much faster than global memory but limited in size per block.
Global Memory
Main memory accessible by all threads and the CPU, with large capacity but high latency. Global memory persists between kernel launches and constitutes the main data storage area.
CUDA Runtime API
High-level programming interface that simplifies CUDA application development by automatically managing initialization, module loading, and memory management. It provides functions such as cudaMalloc, cudaMemcpy, and cudaLaunchKernel.
Stream
Sequence of operations executed on the GPU in a determined order, enabling parallelism between computation operations and memory transfers. Streams allow concurrent execution of kernels and overlapping of transfers.
Asynchronous Execution
CUDA execution mode where operations return immediately to the CPU without waiting for their completion on the GPU. Asynchronous execution allows overlapping computations and transfers to maximize GPU utilization.
Texture Memory
Memory optimized for 2D or 3D spatial locality accesses, with automatic data caching. Texture memory is particularly efficient for image processing and accesses with low coherence.
Constant Memory
Read-only memory optimized for broadcast accesses where all threads read the same value simultaneously. It is particularly efficient when all threads in a warp access the same address.
Occupancy
Measure of the ratio between the number of active warps and the maximum number of warps that can be resident on a Streaming Multiprocessor. High occupancy does not necessarily guarantee better performance but helps hide latency.
Atomic Operations
Read-modify-write operations executed atomically on global or shared memory, guaranteeing no conflicts between threads. They are essential for reductions and concurrent data updates.
cuBLAS
CUDA Basic Linear Algebra Subroutines library providing optimized GPU implementations for basic linear algebra operations. cuBLAS significantly accelerates matrix and vector computations on NVIDIA architectures.
cuFFT
CUDA Fast Fourier Transform library offering high-performance GPU implementations for discrete Fourier transforms. cuFFT supports 1D, 2D, and 3D transformations with different precisions and sizes.