Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning
arXiv:2510.27044v1 Announce Type: new Abstract: Mathematical reasoning is a central challenge for large language models (LLMs), requiring not only correct answers but also faithful reasoning processes. Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising approach for enhancing…
