Arbitrary Entropy Policy Optimization: Entropy Is Controllable in Reinforcement Fine-tuning
arXiv:2510.08141v3 Announce Type: replace Abstract: Reinforcement fine-tuning (RFT) is essential for enhancing the reasoning capabilities of large language models (LLM), yet the widely adopted Group Relative Policy Optimization (GRPO) suffers from entropy collapse, where entropy monotonically decreases, exploration vanishes, and…
