Archives AI News

FunAudio-ASR Technical Report

arXiv:2509.12508v2 Announce Type: replace-cross Abstract: In recent years, automatic speech recognition (ASR) has witnessed transformative advancements driven by three complementary paradigms: data scaling, model size scaling, and deep integration with large language models (LLMs). However, LLMs are prone to hallucination,…

September 19, 2025

Predicting Multi-Agent Specialization via Task Parallelizability

arXiv:2503.15703v2 Announce Type: replace-cross Abstract: When should we encourage specialization in multi-agent systems versus train generalists that perform the entire task independently? We propose that specialization largely depends on task parallelizability: the potential for multiple agents to execute task components…

September 19, 2025

“What’s Up, Doc?”: Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets

arXiv:2506.21532v2 Announce Type: replace-cross Abstract: People are increasingly seeking healthcare information from large language models (LLMs) via interactive chatbots, yet the nature and inherent risks of these conversations remain largely unexplored. In this paper, we filter large-scale conversational AI datasets…

September 19, 2025

Rationality Check! Benchmarking the Rationality of Large Language Models

arXiv:2509.14546v1 Announce Type: new Abstract: Large language models (LLMs), a recent advance in deep learning and machine intelligence, have manifested astonishing capacities, now considered among the most promising for artificial general intelligence. With human-like capabilities, LLMs have been used to…

September 19, 2025

EXPLOR: Extrapolatory Pseudo-Label Matching for Out-of-distribution Uncertainty Based Rejection

arXiv:2406.01825v4 Announce Type: replace-cross Abstract: EXPLOR is a novel framework that utilizes support-expanding, extrapolatory pseudo-labeling to improve prediction and uncertainty-based rejection on out-of-distribution (OOD) points. EXPLOR utilizes a diverse set of base models as pseudo-labelers on the expansive augmented data…

September 19, 2025

Beyond the high score: Prosocial ability profiles of multi-agent populations

arXiv:2509.14485v1 Announce Type: new Abstract: The development and evaluation of social capabilities in AI agents require complex environments where competitive and cooperative behaviours naturally emerge. While game-theoretic properties can explain why certain teams or agent populations outperform others, more abstract…

September 19, 2025

DeKeyNLU: Enhancing Natural Language to SQL Generation through Task Decomposition and Keyword Extraction

arXiv:2509.14507v1 Announce Type: new Abstract: Natural Language to SQL (NL2SQL) provides a new model-centric paradigm that simplifies database access for non-technical users by converting natural language queries into SQL commands. Recent advancements, particularly those integrating Retrieval-Augmented Generation (RAG) and Chain-of-Thought…

September 19, 2025

From Mimicry to True Intelligence (TI) – A New Paradigm for Artificial General Intelligence

arXiv:2509.14474v1 Announce Type: new Abstract: The debate around Artificial General Intelligence (AGI) remains open due to two fundamentally different goals: replicating human-like performance versus replicating human-like cognitive processes. We argue that current performance-based definitions are inadequate because they provide no…

September 19, 2025

VCBench: Benchmarking LLMs in Venture Capital

arXiv:2509.14448v1 Announce Type: new Abstract: Benchmarks such as SWE-bench and ARC-AGI demonstrate how shared datasets accelerate progress toward artificial general intelligence (AGI). We introduce VCBench, the first benchmark for predicting founder success in venture capital (VC), a domain where signals…

September 19, 2025

Detecting Pipeline Failures through Fine-Grained Analysis of Web Agents

arXiv:2509.14382v1 Announce Type: new Abstract: Web agents powered by large language models (LLMs) can autonomously perform complex, multistep tasks in dynamic web environments. However, current evaluations mostly focus on the overall success while overlooking intermediate errors. This limits insight into…

September 19, 2025