Archives AI News

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

arXiv:2505.09558v2 Announce Type: replace-cross Abstract: End-to-end spoken dialogue models such as GPT-4o-audio have recently garnered significant attention in the speech domain. However, the evaluation of spoken dialogue models’ conversational performance has largely been overlooked. This is primarily due to the…

September 24, 2025

G”odel Test: Can Large Language Models Solve Easy Conjectures?

arXiv:2509.18383v1 Announce Type: new Abstract: Recent announcements from frontier AI model labs have highlighted strong results on high-school and undergraduate math competitions. Yet it remains unclear whether large language models can solve new, simple conjectures in more advanced areas of…

September 24, 2025

Generative Medical Event Models Improve with Scale

arXiv:2508.12104v2 Announce Type: replace-cross Abstract: Realizing personalized medicine at scale calls for methods that distill insights from longitudinal patient journeys, which can be viewed as a sequence of medical events. Foundation models pretrained on large-scale medical event data represent a…

September 24, 2025

ATLAS: Benchmarking and Adapting LLMs for Global Trade via Harmonized Tariff Code Classification

arXiv:2509.18400v1 Announce Type: new Abstract: Accurate classification of products under the Harmonized Tariff Schedule (HTS) is a critical bottleneck in global trade, yet it has received little attention from the machine learning community. Misclassification can halt shipments entirely, with major…

September 24, 2025

Instruction-Following Evaluation in Function Calling for Large Language Models

arXiv:2509.18420v1 Announce Type: new Abstract: Function calling is a core capability of large language models, essential for AI agents. Existing benchmarks such as the Berkeley Function Calling Leaderboard (BFCL), tau^2-Bench (arXiv:2506.07982), and ACEBench (arXiv:2501.12851) evaluate argument correctness but do not…

September 24, 2025

Fine-Grained Detection of AI-Generated Text Using Sentence-Level Segmentation

arXiv:2509.17830v2 Announce Type: replace-cross Abstract: Generation of Artificial Intelligence (AI) texts in important works has become a common practice that can be used to misuse and abuse AI at various levels. Traditional AI detectors often rely on document-level classification, which…

September 24, 2025

Memory-QA: Answering Recall Questions Based on Multimodal Memories

arXiv:2509.18436v1 Announce Type: new Abstract: We introduce Memory-QA, a novel real-world task that involves answering recall questions about visual content from previously stored multimodal memories. This task poses unique challenges, including the creation of task-oriented memories, the effective utilization of…

September 24, 2025

Tackling GNARLy Problems: Graph Neural Algorithmic Reasoning Reimagined through Reinforcement Learning

arXiv:2509.18930v1 Announce Type: cross Abstract: Neural Algorithmic Reasoning (NAR) is a paradigm that trains neural networks to execute classic algorithms by supervised learning. Despite its successes, important limitations remain: inability to construct valid solutions without post-processing and to reason about…

September 24, 2025

FERA: Foil Fencing Referee Assistant Using Pose-Based Multi-Label Move Recognition and Rule Reasoning

arXiv:2509.18527v1 Announce Type: new Abstract: The sport of fencing, like many other sports, faces challenges in refereeing: subjective calls, human errors, bias, and limited availability in practice environments. We present FERA (Fencing Referee Assistant), a prototype AI referee for foil…

September 24, 2025

Reduced-Order Model-Guided Reinforcement Learning for Demonstration-Free Humanoid Locomotion

arXiv:2509.19023v1 Announce Type: cross Abstract: We introduce Reduced-Order Model-Guided Reinforcement Learning (ROM-GRL), a two-stage reinforcement learning framework for humanoid walking that requires no motion capture data or elaborate reward shaping. In the first stage, a compact 4-DOF (four-degree-of-freedom) reduced-order model…

September 24, 2025