CROP: Conservative Reward for Model-based Offline Policy Optimization
arXiv:2310.17245v2 Announce Type: replace Abstract: Offline reinforcement learning (RL) aims to optimize a policy using collected data without online interactions. Model-based approaches are particularly appealing for addressing offline RL challenges because of their capability to mitigate the limitations of data…
