Quantization and Compression
Blockwise Quantization
Technique that divides weight or activation tensors into smaller blocks and applies independent quantization to each block. It better captures local magnitude variations, reducing overall quantization error.
← Quay lại