Archives AI News

Learning to Orchestrate Agents in Natural Language with the Conductor

arXiv:2512.04388v1 Announce Type: new Abstract: Powerful large language models (LLMs) from different providers have been expensively trained and finetuned to specialize across varying domains. In this work, we introduce a new kind of Conductor model trained with reinforcement learning to…

December 5, 2025

Open-Set Domain Adaptation Under Background Distribution Shift: Challenges and A Provably Efficient Solution

arXiv:2512.01152v2 Announce Type: replace Abstract: As we deploy machine learning systems in the real world, a core challenge is to maintain a model that is performant even as the data shifts. Such shifts can take many forms: new classes may…

December 5, 2025

Feature Engineering vs. Deep Learning for Automated Coin Grading: A Comparative Study on Saint-Gaudens Double Eagles

arXiv:2512.04464v1 Announce Type: new Abstract: We challenge the common belief that deep learning always trumps older techniques, using the example of grading Saint-Gaudens Double Eagle gold coins automatically. In our work, we put a feature-based Artificial Neural Network built around…

December 5, 2025

DAVE: Diagnostic benchmark for Audio Visual Evaluation

arXiv:2503.09321v2 Announce Type: replace-cross Abstract: Audio-visual understanding is a rapidly evolving field that seeks to integrate and interpret information from both auditory and visual modalities. Despite recent advances in multi-modal learning, existing benchmarks often suffer from strong visual bias —…

December 5, 2025

GraphBench: Next-generation graph learning benchmarking

arXiv:2512.04475v1 Announce Type: new Abstract: Machine learning on graphs has recently achieved impressive progress in various domains, including molecular property prediction and chip design. However, benchmarking practices remain fragmented, often relying on narrow, task-specific datasets and inconsistent evaluation protocols, which…

December 5, 2025

An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems

arXiv:2508.08833v3 Announce Type: replace-cross Abstract: In this paper, we introduce a systematic framework beyond conventional method to assess LLMs’ mathematical-reasoning robustness by stress-testing them on advanced math problems that are mathematically equivalent but with linguistic and parametric variation. These transformations…

December 5, 2025

Context-Aware Mixture-of-Experts Inference on CXL-Enabled GPU-NDP Systems

arXiv:2512.04476v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models scale large language models through conditional computation, but inference becomes memory-bound once expert weights exceed the capacity of GPU memory. In this case, weights must be offloaded to external memory, and fetching…

December 5, 2025

Prototype-Based Semantic Consistency Alignment for Domain Adaptive Retrieval

arXiv:2512.04524v1 Announce Type: new Abstract: Domain adaptive retrieval aims to transfer knowledge from a labeled source domain to an unlabeled target domain, enabling effective retrieval while mitigating domain discrepancies. However, existing methods encounter several fundamental limitations: 1) neglecting class-level semantic…

December 5, 2025

The Peril of Preference: Why GRPO fails on Ordinal Rewards

arXiv:2511.04439v2 Announce Type: replace-cross Abstract: Group-relative Policy Optimization’s (GRPO) simplicity makes it highly desirable for adapting LLMs to become experts at specific tasks. But this simplicity also makes it ill-specified as we seek to enhance RL training with richer, non-binary…

December 5, 2025

SeSE: A Structural Information-Guided Uncertainty Quantification Framework for Hallucination Detection in LLMs

arXiv:2511.16275v2 Announce Type: replace-cross Abstract: Reliable uncertainty quantification (UQ) is essential for deploying large language models (LLMs) in safety-critical scenarios, as it enables them to abstain from responding when uncertain, thereby avoiding “hallucinating” falsehoods. However, state-of-the-art UQ methods primarily rely…

December 5, 2025