Safety-Biased Policy Optimisation: Towards Hard-Constrained Reinforcement Learning via Trust Regions
arXiv:2512.23770v1 Announce Type: new Abstract: Reinforcement learning (RL) in safety-critical domains requires agents to maximise rewards while strictly adhering to safety constraints. Existing approaches, such as Lagrangian and projection-based methods, often either fail to ensure near-zero safety violations or sacrifice…
