Quantization and Compression
Structured Pruning
Compression technique that removes entire weight structures, such as filters, channels, or attention heads, rather than individual weights. It is more effective for accelerating computation on modern hardware than unstructured pruning.
← Geri