CUDA Programming - AI Glossarium

📖

termen

Kernel

CUDA function executed on the GPU by a large number of threads simultaneously. The kernel is launched from the CPU and executed in parallel on the GPU device with a specific grid and block configuration.

📖

termen

Thread

Basic execution unit in CUDA, representing a single sequence of instructions executed on a GPU processor core. Threads are organized into blocks and execute the same code on different data.

📖

termen

Block

Collection of threads that can communicate with each other via shared memory and synchronize their execution. Blocks are organized into a grid and execute on the same Streaming Multiprocessor (SM).

📖

termen

Grid

Set of thread blocks that constitute the complete execution configuration of a CUDA kernel. The grid represents the highest hierarchical structure of thread organization in CUDA.

📖

termen

Warp

Group of 32 threads that execute simultaneously in SIMT (Single Instruction Multiple Thread) mode on a CUDA SM. All threads in a warp execute the same instruction at the same clock cycle.

📖

termen

Shared Memory

Fast, small-sized memory shared by all threads of the same block, enabling efficient communication between threads. Shared memory is much faster than global memory but limited in size per block.

📖

termen

Global Memory

Main memory accessible by all threads and the CPU, with large capacity but high latency. Global memory persists between kernel launches and constitutes the main data storage area.

📖

termen

CUDA Runtime API

High-level programming interface that simplifies CUDA application development by automatically managing initialization, module loading, and memory management. It provides functions such as cudaMalloc, cudaMemcpy, and cudaLaunchKernel.

📖

termen

Stream

Sequence of operations executed on the GPU in a determined order, enabling parallelism between computation operations and memory transfers. Streams allow concurrent execution of kernels and overlapping of transfers.

📖

termen

Asynchronous Execution

CUDA execution mode where operations return immediately to the CPU without waiting for their completion on the GPU. Asynchronous execution allows overlapping computations and transfers to maximize GPU utilization.

📖

termen

Texture Memory

Memory optimized for 2D or 3D spatial locality accesses, with automatic data caching. Texture memory is particularly efficient for image processing and accesses with low coherence.

📖

termen

Constant Memory

Read-only memory optimized for broadcast accesses where all threads read the same value simultaneously. It is particularly efficient when all threads in a warp access the same address.

📖

termen

Occupancy

Measure of the ratio between the number of active warps and the maximum number of warps that can be resident on a Streaming Multiprocessor. High occupancy does not necessarily guarantee better performance but helps hide latency.

📖

termen

Atomic Operations

Read-modify-write operations executed atomically on global or shared memory, guaranteeing no conflicts between threads. They are essential for reductions and concurrent data updates.

📖

termen

cuBLAS

CUDA Basic Linear Algebra Subroutines library providing optimized GPU implementations for basic linear algebra operations. cuBLAS significantly accelerates matrix and vector computations on NVIDIA architectures.

📖

termen

cuFFT

CUDA Fast Fourier Transform library offering high-performance GPU implementations for discrete Fourier transforms. cuFFT supports 1D, 2D, and 3D transformations with different precisions and sizes.

AI-woordenlijst

Kernel

Thread

Block

Grid

Warp

Shared Memory

Global Memory

CUDA Runtime API

Stream

Asynchronous Execution

Texture Memory

Constant Memory

Occupancy

Atomic Operations

cuBLAS

cuFFT

Geen resultaten gevonden