YRC-Bench: A Benchmark for Learning to Coordinate with Experts
arXiv:2502.09583v3 Announce Type: replace Abstract: When deployed in the real world, AI agents will inevitably face challenges that exceed their individual capabilities. A critical component of AI safety is an agent’s ability to recognize when it is likely to fail…
