Quantization and Compression
Unstructured Pruning
Compression method that eliminates individual weights in the network, typically those with the smallest magnitude. Although it can reduce model size, it requires specialized hardware support (sparsity) to accelerate computation.
← Indietro