Quantification and Optimization
Speculative Inference
Generative inference acceleration technique where a small 'draft' model quickly proposes multiple tokens, which are then validated in parallel by the large target model, reducing the total number of costly computation steps.
← Geri