Archives AI News

SPACeR: Self-Play Anchoring with Centralized Reference Models

arXiv:2510.18060v2 Announce Type: replace Abstract: Developing autonomous vehicles (AVs) requires not only safety and efficiency, but also realistic, human-like behaviors that are socially aware and predictable. Achieving this requires sim agent policies that are human-like, fast, and scalable in multi-agent…

On the Structural Non-Preservation of Epistemic Behaviour under Policy Transformation

arXiv:2602.21424v1 Announce Type: new Abstract: Reinforcement learning (RL) agents under partial observability often condition actions on internally accumulated information such as memory or inferred latent context. We formalise such information-conditioned interaction patterns as behavioural dependency: variation in action selection with…

Spurious Rewards: Rethinking Training Signals in RLVR

arXiv:2506.10947v2 Announce Type: replace-cross Abstract: We show that reinforcement learning with verifiable rewards (RLVR) can elicit strong mathematical reasoning in certain language models even with spurious rewards that have little, no, or even negative correlation with the correct answer. For…

Optimizer choice matters for the emergence of Neural Collapse

arXiv:2602.16642v3 Announce Type: replace Abstract: Neural Collapse (NC) refers to the emergence of highly symmetric geometric structures in the representations of deep neural networks during the terminal phase of training. Despite its prevalence, the theoretical understanding of NC remains limited.…