Recontextualization Mitigates Specification Gaming without Modifying the Specification
arXiv:2512.19027v1 Announce Type: cross Abstract: Developers often struggle to specify correct training labels and rewards. Perhaps they don’t need to. We propose recontextualization, which reduces how often language models “game” training signals, performing misbehaviors those signals mistakenly reinforce. We show…
