🏠 Trang chủ
Benchmark
📊 Tất cả benchmark 🦖 Khủng long v1 🦖 Khủng long v2 ✅ Ứng dụng To-Do List 🎨 Trang tự do sáng tạo 🎯 FSACB - Trình diễn cuối cùng 🌍 Benchmark dịch thuật
Mô hình
🏆 Top 10 mô hình 🆓 Mô hình miễn phí 📋 Tất cả mô hình ⚙️ Kilo Code
Tài nguyên
💬 Thư viện prompt 📖 Thuật ngữ AI 🔗 Liên kết hữu ích

Thuật ngữ AI

Từ điển đầy đủ về Trí tuệ nhân tạo

162
danh mục
2.032
danh mục con
23.060
thuật ngữ
📖
thuật ngữ

FP16 Operations

Half-precision floating-point calculations (16 bits) offering up to 8x more throughput than FP32 on Tensor Cores, with significant reduction in memory bandwidth and energy consumption.

📖
thuật ngữ

TensorFloat-32 (TF32)

NVIDIA hybrid numerical format using 8 exponent bits (like FP32) and 10 mantissa bits (like FP16), offering an optimal compromise between dynamic range and precision for Ampere Tensor Cores.

📖
thuật ngữ

Warp Matrix Multiply-Accumulate (WMMA)

CUDA API allowing warps of 32 threads to efficiently perform matrix multiply-accumulate operations directly on Tensor Cores with access to fragmented registers.

📖
thuật ngữ

CUDA Kernels for Tensor Cores

GPU programs specifically optimized to leverage Tensor Core instructions, using WMMA primitives or high-level libraries for maximum matrix throughput.

📖
thuật ngữ

Matrix Fragmentation

Technique of partitioning matrices into smaller fragments distributed among warp threads for parallel execution on Tensor Core units, optimizing computational resource utilization.

📖
thuật ngữ

Tensor Core Utilization

Metric measuring the percentage of cycles where Tensor Cores perform useful calculations, crucial for evaluating optimization effectiveness and identifying bottlenecks.

📖
thuật ngữ

INT8 Quantization for Inference

Conversion of neural network weights and activations to 8-bit integers, enabling up to 32x acceleration on Tensor Cores with controlled precision degradation.

📖
thuật ngữ

CublasLt Tensor Core Library

CUBLAS library extension optimized for Tensor Cores, offering high-performance GEMM (General Matrix Multiply) routines with native support for mixed-precision formats.

📖
thuật ngữ

Shared Memory Tiling

Strategy for organizing data in GPU shared memory into optimal tiles for Tensor Core access, minimizing bank conflicts and maximizing bandwidth.

📖
thuật ngữ

Warp-level Matrix Scheduling

Scheduling of matrix operations at the warp level to maximize Tensor Core pipeline utilization, accounting for latencies and data dependencies.

📖
thuật ngữ

Tensor Core Register Pressure

Constraint related to the limited number of registers per SM, affecting the ability to parallelize Tensor Core operations and requiring a balance between occupancy and efficient unit utilization.

📖
thuật ngữ

Deep Learning Benchmarks

Test suites like MLPerf that evaluate Tensor Core optimization performance on real neural network training and inference workloads.

📖
thuật ngữ

Automatic Mixed Precision (AMP)

Automatic operational precision selection technique that identifies eligible Tensor Core operations and maintains FP32 copies for numerical stability.

📖
thuật ngữ

Tensor Core Memory Coalescing

Memory access optimization to align with Tensor Core access patterns, grouping transactions into contiguous accesses to maximize throughput.

📖
thuật ngữ

Sparse Matrix Support

Ampere Tensor Cores' ability to efficiently process structured sparse matrices, offering up to 2x acceleration for neural networks with sparsity.

🔍

Không tìm thấy kết quả