PROF: An LLM-based Reward Code Preference Optimization Framework for Offline Imitation Learning
arXiv:2511.13765v1 Announce Type: new Abstract: Offline imitation learning (offline IL) enables training effective policies without requiring explicit reward annotations. Recent approaches attempt to estimate rewards for unlabeled datasets using a small set of expert demonstrations. However, these methods often assume…
