Archives AI News

LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

arXiv:2606.05400v1 Announce Type: new Abstract: Long-horizon autoformalization of research mathematics fails not only at hard lemmas, but at scale: statements drift, dependencies tangle, context decays, and local repairs corrupt distant work. We present LeanMarathon, a multi-agent harness for reliable research-level…

June 6, 2026

Controllable and Verifiable Process Data Synthesis for Process Reward Models

arXiv:2605.02395v2 Announce Type: replace Abstract: Process reward models (PRMs) rely on high-quality process supervision data, yet existing construction methods often provide limited control over error location, error type, and trajectory consistency. We propose a controllable and verifiable framework for synthesizing…

June 6, 2026

Harnessing Generalist Agents for Contextualized Time Series

arXiv:2606.05404v1 Announce Type: new Abstract: Time series are often embedded in rich contexts that are essential for holistic modeling. Moreover, real-world practitioners often require end-to-end workflows for analyzing temporal dynamics, where widely studied tasks such as forecasting are only one…

June 6, 2026

Agents’ Last Exam

arXiv:2606.05405v1 Announce Type: new Abstract: Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not translated into economically meaningful deployment across many professional domains. We argue that this gap is largely an…

June 6, 2026

ECI: Effective Contrastive Information to Evaluate Hard-Negatives

arXiv:2603.20990v2 Announce Type: replace-cross Abstract: Hard-negative source selection for dense retrieval is usually decided only after fine-tuning and downstream evaluation. We propose Effective Contrastive Information (ECI), a training-free diagnostic that ranks candidate negative sources using frozen target-encoder embeddings. ECI is…

June 6, 2026

Mutation Without Variation: Convergence Dynamics in LLM-Driven Program Evolution

arXiv:2606.05408v1 Announce Type: new Abstract: When an LLM repeatedly mutates a program, does it explore new forms or circle back to the same ones? We study this question by analyzing LLM-driven mutation chains in the absence of selection pressure within…

June 6, 2026

Extreme Region Policy Distillation

arXiv:2605.25582v2 Announce Type: replace-cross Abstract: Reinforcement learning for large language models faces a fundamental trade-off between sample efficiency and asymptotic performance: strictly on-policy methods discard trajectories after a single update, while off-policy reuse introduces distribution mismatch that existing trust-region techniques…

June 6, 2026

A Motivational Architecture for Conversational AGI

arXiv:2606.05411v1 Announce Type: new Abstract: Motivational architectures in cognitive AI have largely been designed for physical agents regulating bodily needs. Conversational agents operate in a different regime: their sensorimotor loop is linguistic, their environment is a user’s evolving mental state,…

June 6, 2026

To Be Multimodal or Not to Be: Query-Adaptive Audio-Visual Person Retrieval via Active Modality Detection

arXiv:2606.05931v1 Announce Type: cross Abstract: When retrieving a person from a video archive by voice and face, should the system be multimodal or not? In real-world broadcast archives, unlike curated benchmarks, a target may be heard but unseen, seen but…

June 6, 2026

Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers

arXiv:2606.05420v1 Announce Type: new Abstract: The rapid proliferation of hyperscale data centers (HDCs) in the US, mainly driven by the adoption of artificial intelligence, has raised concerns about this industry’s environmental footprint. We compiled facility-level information on 403 US hyperscale…

June 6, 2026