CUDA Programming - 인공지능 용어집

📖

용어

Kernel

CUDA function executed on the GPU by a large number of threads simultaneously. The kernel is launched from the CPU and executed in parallel on the GPU device with a specific grid and block configuration.

📖

용어

Thread

Basic execution unit in CUDA, representing a single sequence of instructions executed on a GPU processor core. Threads are organized into blocks and execute the same code on different data.

📖

용어

Block

Collection of threads that can communicate with each other via shared memory and synchronize their execution. Blocks are organized into a grid and execute on the same Streaming Multiprocessor (SM).

📖

용어

Grid

Set of thread blocks that constitute the complete execution configuration of a CUDA kernel. The grid represents the highest hierarchical structure of thread organization in CUDA.

📖

용어

Warp

Group of 32 threads that execute simultaneously in SIMT (Single Instruction Multiple Thread) mode on a CUDA SM. All threads in a warp execute the same instruction at the same clock cycle.

📖

용어

Shared Memory

Fast, small-sized memory shared by all threads of the same block, enabling efficient communication between threads. Shared memory is much faster than global memory but limited in size per block.

📖

용어

Global Memory

Main memory accessible by all threads and the CPU, with large capacity but high latency. Global memory persists between kernel launches and constitutes the main data storage area.

📖

용어

CUDA Runtime API

High-level programming interface that simplifies CUDA application development by automatically managing initialization, module loading, and memory management. It provides functions such as cudaMalloc, cudaMemcpy, and cudaLaunchKernel.

📖

용어

Stream

Sequence of operations executed on the GPU in a determined order, enabling parallelism between computation operations and memory transfers. Streams allow concurrent execution of kernels and overlapping of transfers.

📖

용어

Asynchronous Execution

CUDA execution mode where operations return immediately to the CPU without waiting for their completion on the GPU. Asynchronous execution allows overlapping computations and transfers to maximize GPU utilization.

📖

용어

Texture Memory

Memory optimized for 2D or 3D spatial locality accesses, with automatic data caching. Texture memory is particularly efficient for image processing and accesses with low coherence.

📖

용어

Constant Memory

Read-only memory optimized for broadcast accesses where all threads read the same value simultaneously. It is particularly efficient when all threads in a warp access the same address.

📖

용어

Occupancy

Measure of the ratio between the number of active warps and the maximum number of warps that can be resident on a Streaming Multiprocessor. High occupancy does not necessarily guarantee better performance but helps hide latency.

📖

용어

Atomic Operations

Read-modify-write operations executed atomically on global or shared memory, guaranteeing no conflicts between threads. They are essential for reductions and concurrent data updates.

📖

용어

cuBLAS

CUDA Basic Linear Algebra Subroutines library providing optimized GPU implementations for basic linear algebra operations. cuBLAS significantly accelerates matrix and vector computations on NVIDIA architectures.

📖

용어

cuFFT

CUDA Fast Fourier Transform library offering high-performance GPU implementations for discrete Fourier transforms. cuFFT supports 1D, 2D, and 3D transformations with different precisions and sizes.

AI 용어집