VNU-Bench: A Benchmarking Dataset for Multi-Source Multimodal News Video Understanding
arXiv:2601.03434v1 Announce Type: new Abstract: News videos are carefully edited multimodal narratives that combine narration, visuals, and external quotations into coherent storylines. In recent years, there have been significant advances in evaluating multimodal large language models (MLLMs) for news video…
