Tensor Cores Optimization
Warp Matrix Multiply-Accumulate (WMMA)
CUDA API allowing warps of 32 threads to efficiently perform matrix multiply-accumulate operations directly on Tensor Cores with access to fragmented registers.
← Geri