Quantization and Compression
Low-Rank Matrix Factorization
Compression technique that decomposes a large weight matrix into two or more smaller matrices. It reduces the number of parameters and matrix multiplication operations, thus accelerating dense and convolutional layers.
← Geri