AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization
arXiv:2405.18187v2 Announce Type: replace Abstract: Implicit Q-learning (IQL) serves as a strong baseline for offline RL, which learns the value function using only dataset actions through quantile regression. However, it is unclear how to recover the implicit policy from the…
