Archives AI News

FunAudio-ASR Technical Report

arXiv:2509.12508v2 Announce Type: replace-cross Abstract: In recent years, automatic speech recognition (ASR) has witnessed transformative advancements driven by three complementary paradigms: data scaling, model size scaling, and deep integration with large language models (LLMs). However, LLMs are prone to hallucination,…

Predicting Multi-Agent Specialization via Task Parallelizability

arXiv:2503.15703v2 Announce Type: replace-cross Abstract: When should we encourage specialization in multi-agent systems versus train generalists that perform the entire task independently? We propose that specialization largely depends on task parallelizability: the potential for multiple agents to execute task components…

Rationality Check! Benchmarking the Rationality of Large Language Models

arXiv:2509.14546v1 Announce Type: new Abstract: Large language models (LLMs), a recent advance in deep learning and machine intelligence, have manifested astonishing capacities, now considered among the most promising for artificial general intelligence. With human-like capabilities, LLMs have been used to…

Beyond the high score: Prosocial ability profiles of multi-agent populations

arXiv:2509.14485v1 Announce Type: new Abstract: The development and evaluation of social capabilities in AI agents require complex environments where competitive and cooperative behaviours naturally emerge. While game-theoretic properties can explain why certain teams or agent populations outperform others, more abstract…

VCBench: Benchmarking LLMs in Venture Capital

arXiv:2509.14448v1 Announce Type: new Abstract: Benchmarks such as SWE-bench and ARC-AGI demonstrate how shared datasets accelerate progress toward artificial general intelligence (AGI). We introduce VCBench, the first benchmark for predicting founder success in venture capital (VC), a domain where signals…

Detecting Pipeline Failures through Fine-Grained Analysis of Web Agents

arXiv:2509.14382v1 Announce Type: new Abstract: Web agents powered by large language models (LLMs) can autonomously perform complex, multistep tasks in dynamic web environments. However, current evaluations mostly focus on the overall success while overlooking intermediate errors. This limits insight into…