arXiv:2506.04118v2 Announce Type: replace-cross
Abstract: We propose Guided Speculative Inference (GSI), a novel algorithm for efficient reward-guided decoding in large language models. GSI combines soft best-of-$n$ test-time scaling with a reward model $r(x,y)$ and speculative samples from a small auxiliary model $pi_S(ymid x)$. We provably approximate both the optimal tilted policy $pi_{beta,B}(ymid x) propto pi_B(ymid x)exp(beta,r(x,y))$ of soft best-of-$n$ under the base model $pi_B$, as well as the expected reward under the optimal policy. In experiments on reasoning benchmarks (MATH500, OlympiadBench, Minerva Math, MMLU-STEM, GSM8K), our method achieves higher accuracy than standard soft best-of-$n$ with $pi_S$ and reward-guided speculative decoding (Liao et al., 2025), and in certain settings even outperforms soft best-of-$n$ with $pi_B$. The code is available at https://github.com/j-geuter/GSI .
