Archives AI News

Learning to Orchestrate Agents in Natural Language with the Conductor

arXiv:2512.04388v1 Announce Type: new Abstract: Powerful large language models (LLMs) from different providers have been expensively trained and finetuned to specialize across varying domains. In this work, we introduce a new kind of Conductor model trained with reinforcement learning to…

DAVE: Diagnostic benchmark for Audio Visual Evaluation

arXiv:2503.09321v2 Announce Type: replace-cross Abstract: Audio-visual understanding is a rapidly evolving field that seeks to integrate and interpret information from both auditory and visual modalities. Despite recent advances in multi-modal learning, existing benchmarks often suffer from strong visual bias —…

GraphBench: Next-generation graph learning benchmarking

arXiv:2512.04475v1 Announce Type: new Abstract: Machine learning on graphs has recently achieved impressive progress in various domains, including molecular property prediction and chip design. However, benchmarking practices remain fragmented, often relying on narrow, task-specific datasets and inconsistent evaluation protocols, which…

An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems

arXiv:2508.08833v3 Announce Type: replace-cross Abstract: In this paper, we introduce a systematic framework beyond conventional method to assess LLMs’ mathematical-reasoning robustness by stress-testing them on advanced math problems that are mathematically equivalent but with linguistic and parametric variation. These transformations…

Context-Aware Mixture-of-Experts Inference on CXL-Enabled GPU-NDP Systems

arXiv:2512.04476v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models scale large language models through conditional computation, but inference becomes memory-bound once expert weights exceed the capacity of GPU memory. In this case, weights must be offloaded to external memory, and fetching…

Prototype-Based Semantic Consistency Alignment for Domain Adaptive Retrieval

arXiv:2512.04524v1 Announce Type: new Abstract: Domain adaptive retrieval aims to transfer knowledge from a labeled source domain to an unlabeled target domain, enabling effective retrieval while mitigating domain discrepancies. However, existing methods encounter several fundamental limitations: 1) neglecting class-level semantic…

The Peril of Preference: Why GRPO fails on Ordinal Rewards

arXiv:2511.04439v2 Announce Type: replace-cross Abstract: Group-relative Policy Optimization’s (GRPO) simplicity makes it highly desirable for adapting LLMs to become experts at specific tasks. But this simplicity also makes it ill-specified as we seek to enhance RL training with richer, non-binary…