Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling
arXiv:2507.01679v2 Announce Type: replace Abstract: Existing post-training techniques for large language models are broadly categorized into Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT). Each paradigm presents a distinct trade-off: SFT excels at mimicking demonstration data but can lead to problematic…
