ML infrastructure management
Inference Optimization
Set of techniques (quantization, pruning, distillation) aimed at reducing model latency and memory consumption during the production inference phase.
← Tillbaka