MEM: Multi-Scale Embodied Memory for Vision Language Action Models
arXiv:2603.03596v2 Announce Type: replace-cross Abstract: Conventionally, memory in end-to-end robotic learning involves inputting a sequence of past observations into the learned policy. However, in complex multi-stage real-world tasks, the robot’s memory must represent past events at multiple levels of granularity:…
