Low-Resource Models
Structured Pruning
Model pruning technique that removes entire coherent components (attention heads, neurons, layers) rather than individual weights, to reduce computational size while maintaining an architecture compatible with hardware accelerators.
← Zurück