Optimization and Computational Efficiency
Low-Rank Inference
Approach that approximates the model's large weight matrices by products of lower-rank matrices, drastically reducing the number of parameters and matrix multiplication operations during inference.
← Terug