Archives AI News

YRC-Bench: A Benchmark for Learning to Coordinate with Experts

arXiv:2502.09583v3 Announce Type: replace Abstract: When deployed in the real world, AI agents will inevitably face challenges that exceed their individual capabilities. A critical component of AI safety is an agent’s ability to recognize when it is likely to fail…

Visually Prompted Benchmarks Are Surprisingly Fragile

arXiv:2512.17875v2 Announce Type: replace-cross Abstract: A key challenge in evaluating VLMs is testing models’ ability to analyze visual content independently from their textual priors. Recent benchmarks such as BLINK probe visual perception through visual prompting, where questions about visual content…

VGC-Bench: Towards Mastering Diverse Team Strategies in Competitive Pok’emon

arXiv:2506.10326v3 Announce Type: replace-cross Abstract: Developing AI agents that can robustly adapt to varying strategic landscapes without retraining is a central challenge in multi-agent learning. Pok’emon Video Game Championships (VGC) is a domain with a vast space of approximately $10^{139}$…

Directed Homophily-Aware Graph Neural Network

arXiv:2505.22362v3 Announce Type: replace Abstract: Graph Neural Networks (GNNs) have achieved significant success in various learning tasks on graph-structured data. Nevertheless, most GNNs struggle to generalize to heterophilic neighborhoods. Additionally, many GNNs ignore the directional nature of real-world graphs, resulting…