Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning
arXiv:2506.21427v3 Announce Type: replace Abstract: Generative models such as diffusion and flow-matching offer expressive policies for offline reinforcement learning (RL) by capturing rich, multimodal action distributions, but their iterative sampling introduces high inference costs and training instability due to gradient…
