Offline Constrained RLHF with Multiple Preference Oracles
arXiv:2604.00200v1 Announce Type: new Abstract: We study offline constrained reinforcement learning from human feedback with multiple preference oracles. Motivated by applications that trade off performance with safety or fairness, we aim to maximize target population utility subject to a minimum…
