Multi-Token Prediction via Self-Distillation
arXiv:2602.06019v2 Announce Type: replace-cross Abstract: Existing techniques for accelerating language model inference, such as speculative decoding, require training auxiliary speculator models and building and deploying complex inference pipelines. We consider a new approach for converting a pretrained autoregressive language model…
