Beyond Introspection: Reinforcing Thinking via Externalist Behavioral Feedback
arXiv:2501.01457v3 Announce Type: replace Abstract: While inference-time thinking allows Large Language Models (LLMs) to address complex problems, the extended thinking process can be unreliable or inconsistent because of the model’s probabilistic nature, especially near its knowledge boundaries. Existing approaches attempt…
