Quantization and Compression
Weight Sharing
Compression technique that groups weights into clusters and replaces each weight with the index of its cluster centroid. This reduces the number of bits needed to store each weight and enables the use of lookup tables during inference.
← Wstecz