GPU Kernel Optimization - Bảng thuật ngữ Trí tuệ nhân tạo

📖

thuật ngữ

Thread Divergence

Phenomenon where threads within the same warp take different execution paths, causing branch serialization and significantly reducing parallel performance on GPU.

📖

thuật ngữ

Shared Memory Bank Conflicts

Contention that occurs when multiple threads from the same warp attempt to simultaneously access the same memory bank of shared memory, causing access serialization.

📖

thuật ngữ

Warp Scheduling

GPU scheduler mechanism that optimizes warp scheduling to maximize compute unit utilization and hide memory latency.

📖

thuật ngữ

Register Spilling

Phenomenon where the compiler must move data from registers to local memory (slow) when registers are insufficient, significantly degrading performance.

📖

thuật ngữ

Instruction Throughput

Measure of the number of instructions that can be executed per clock cycle, optimized by favoring native arithmetic operations and avoiding complex instructions.

📖

thuật ngữ

Grid Stride Loop

Loop pattern where each thread processes multiple elements spaced by the total grid size, allowing processing of datasets larger than the thread grid.

📖

thuật ngữ

Loop Unrolling

Optimization technique that eliminates loop iterations by duplicating the loop body, reducing loop control overhead and increasing instruction-level parallelism.

📖

thuật ngữ

Memory Latency Hiding

Strategy of launching enough warps so the GPU can switch to ready warps while others wait for memory accesses.

📖

thuật ngữ

Vector Memory Operations

Instructions that transfer multiple data simultaneously (float2, float4) between global memory and registers, improving effective bandwidth.

📖

thuật ngữ

Cooperative Groups

CUDA API allowing flexible and collective synchronization between threads beyond traditional block boundaries, optimizing complex communication patterns.

📖

thuật ngữ

Texture Memory Caching

Use of texture memory with its optimized cache for spatial accesses, particularly effective for access patterns with 2D locality.

📖

thuật ngữ

Atomic Operations Optimization

Techniques to reduce contention on atomic operations, notably by using shared memory for local aggregation before global update.

📖

thuật ngữ

Kernel Launch Overhead

Time cost associated with launching a GPU kernel, minimized by merging multiple small kernels into one larger one or using dynamic parallelism.

📖

thuật ngữ

Work Distribution Balance

Optimization of work distribution between threads to avoid load imbalance where some threads finish much earlier than others.

📖

thuật ngữ

Prefetching Strategy

Anticipatory technique of loading data into shared memory before their use, thus hiding the latency of global memory accesses.

Thuật ngữ AI

Thread Divergence

Shared Memory Bank Conflicts

Warp Scheduling

Register Spilling

Instruction Throughput

Grid Stride Loop

Loop Unrolling

Memory Latency Hiding

Vector Memory Operations

Cooperative Groups

Texture Memory Caching

Atomic Operations Optimization

Kernel Launch Overhead

Work Distribution Balance

Prefetching Strategy

Không tìm thấy kết quả