GPU Computing for AI - 인공지능 용어집

📖

용어

CUDA

Parallel computing architecture and programming interface created by NVIDIA, allowing developers to use GPUs for general-purpose computing through extensions to the C/C++ language.

📖

용어

Tensor Core

Specialized computing units integrated into modern NVIDIA GPUs, designed to exponentially accelerate matrix multiplication and addition operations, fundamental for deep neural networks.

📖

용어

ROCm

Open source computing platform for AMD GPUs, offering a complete ecosystem of programming languages (HIP), libraries (MIOpen), and tools for high-performance computing and AI.

📖

용어

OpenCL

Open standard for writing programs that run on heterogeneous platforms, including CPUs, GPUs, and other processors, defining a language based on C99 and APIs for device management.

📖

용어

cuDNN

GPU-accelerated library of primitives for deep neural networks, developed by NVIDIA, providing highly optimized implementations for convolution, pooling, and normalization routines.

📖

용어

Memory Bandwidth

Maximum data transfer rate between the GPU and its video memory (VRAM), measured in GB/s, constituting a critical factor for the performance of intensive computations and training of large AI models.

📖

용어

Kernel

Main function executed on the GPU in a parallel computing program, launched on a grid of threads and designed to process a specific portion of data simultaneously.

📖

용어

Warp

Group of 32 threads executed in SIMT (Single Instruction, Multiple Thread) mode on NVIDIA GPUs, sharing the same instruction stream and constituting the basic scheduling unit for parallel execution.

📖

용어

Stream Multiprocessor (SM)

Basic computing unit on an NVIDIA GPU, containing cores, shared memory units, and schedulers, capable of simultaneously executing multiple thread blocks and managing their execution.

📖

용어

Shared Memory

Fast, low-latency memory space shared among threads within the same block on a GPU, enabling collaboration and reducing accesses to the much slower global memory.

📖

용어

Unified Memory

Memory management technology that creates a single address space between the CPU and GPU, eliminating the need for explicit data copies and simplifying the development of heterogeneous applications.

📖

용어

NVLink

High-bandwidth interconnect technology developed by NVIDIA, enabling direct and fast communication between multiple GPUs, surpassing the limitations of the PCIe bus for distributed computing.

📖

용어

FP16 (Half-Precision)

16-bit floating-point number format used to accelerate computations and reduce memory footprint in neural networks, at the cost of a slight precision loss that is often acceptable.

📖

용어

CUDA Graphs

Technology that allows capturing an entire sequence of CUDA operations in a graph, then re-executing it with minimal overhead, reducing kernel launch costs for repetitive workloads.

📖

용어

HIP

Programming API and compilation language developed by AMD, designed as a portable alternative to CUDA, enabling easier migration of CUDA code to AMD GPUs.

📖

용어

MIOpen

Optimization library for deep neural networks on AMD's ROCm platform, providing high-performance implementations for convolution, pooling, and normalization layers.

📖

용어

Version number describing the characteristics and features of an NVIDIA GPU, including the number of cores, architecture, supported instructions, and computing capabilities, essential for software compatibility.

📖

용어

Coalesced Memory Access

Memory access optimization where adjacent threads in a warp access contiguous memory locations, allowing these requests to be combined into a single wide and efficient memory transaction.

AI 용어집