Consequentialist Objectives and Catastrophe
arXiv:2603.15017v2 Announce Type: replace-cross Abstract: Because human preferences are too complex to codify, AIs operate with misspecified objectives. Optimizing such objectives often produces undesirable outcomes; this phenomenon is known as reward hacking. Such outcomes are not necessarily catastrophic. Indeed, most…
