Visually Prompted Benchmarks Are Surprisingly Fragile
arXiv:2512.17875v2 Announce Type: replace-cross Abstract: A key challenge in evaluating VLMs is testing models’ ability to analyze visual content independently from their textual priors. Recent benchmarks such as BLINK probe visual perception through visual prompting, where questions about visual content…
