Constructing an Optimal Behavior Basis for the Option Keyboard
arXiv:2505.00787v2 Announce Type: replace-cross Abstract: Multi-task reinforcement learning aims to quickly identify solutions for new tasks with minimal or no additional interaction with the environment. Generalized Policy Improvement (GPI) addresses this by combining a set of base policies to produce…
