Learning to Reason as Action Abstractions with Scalable Mid-Training RL
arXiv:2509.25810v2 Announce Type: replace-cross Abstract: Large language models excel with reinforcement learning (RL), but fully unlocking this potential requires a mid-training stage. An effective mid-training phase should identify a compact set of useful actions and enable fast selection among them…
