A Unifying View of Linear Function Approximation in Off-Policy RL Through Matrix Splitting and Preconditioning
arXiv:2501.01774v3 Announce Type: replace Abstract: In off-policy policy evaluation (OPE) tasks within reinforcement learning, Temporal Difference Learning(TD) and Fitted Q-Iteration (FQI) have traditionally been viewed as differing in the number of updates toward the target value function: TD makes one…
