Efficient Offline Reinforcement Learning: First Imitate, then Improve
arXiv:2406.13376v2 Announce Type: replace Abstract: Supervised imitation-based approaches are often favored over off-policy reinforcement learning approaches for learning policies offline, since their straightforward optimization objective makes them computationally efficient and stable to train. However, their performance is fundamentally limited by…
