Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning
arXiv:2509.26605v1 Announce Type: cross Abstract: Deploying reinforcement learning (RL) in robotics, industry, and health care is blocked by two obstacles: the difficulty of specifying accurate rewards and the risk of unsafe, data-hungry exploration. We address this by proposing a two-stage…
