GPT (Generative Pre-trained Transformer)
KV Cache
Inference optimization that caches the keys and values of previous tokens to avoid recalculating attention states at each new token generation.
← Wstecz