Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
arXiv:2510.20064v1 Announce Type: new Abstract: Speculative decoding is widely used in accelerating large language model (LLM) inference. In this work, we focus on the online draft model selection problem in speculative decoding. We design an algorithm that provably competes with…
