Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation
arXiv:2604.17656v2 Announce Type: replace-cross Abstract: Video-to-music (V2M) is the fundamental task of creating background music for an input video. Recent V2M models achieve audiovisual alignment by typically relying on visual conditioning alone and provide limited semantic and stylistic controllability to…
