Archives AI News

Can MLLMs “Read” What is Missing?

arXiv:2604.21277v1 Announce Type: new Abstract: We introduce MMTR-Bench, a benchmark designed to evaluate the intrinsic ability of Multimodal Large Language Models (MLLMs) to reconstruct masked text directly from visual context. Unlike conventional question-answering tasks, MMTR-Bench eliminates explicit prompts, requiring models…

Building a Precise Video Language with Human-AI Oversight

arXiv:2604.21718v1 Announce Type: cross Abstract: Video-language models (VLMs) learn to reason about the dynamic visual world through natural language. We introduce a suite of open datasets, benchmarks, and recipes for scalable oversight that enable precise video captioning. First, we define…

Ideological Bias in LLMs’ Economic Causal Reasoning

arXiv:2604.21334v1 Announce Type: new Abstract: Do large language models (LLMs) exhibit systematic ideological bias when reasoning about economic causal effects? As LLMs are increasingly used in policy analysis and economic reporting, where directionally correct causal judgments are essential, this question…

A Multi-Stage Warm-Start Deep Learning Framework for Unit Commitment

arXiv:2604.21891v1 Announce Type: cross Abstract: Maintaining instantaneous balance between electricity supply and demand is critical for reliability and grid instability. System operators achieve this through solving the task of Unit Commitment (UC),ca high dimensional large-scale Mixed-integer Linear Programming (MILP) problem…

ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams

arXiv:2604.15994v2 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) excel at recognizing individual visual elements and reasoning over simple linear diagrams. However, when faced with complex topological structures involving branching paths, converging flows, and cyclic dependencies, their reasoning capabilities…

Replay-buffer engineering for noise-robust quantum circuit optimization

arXiv:2604.21863v1 Announce Type: cross Abstract: Deep reinforcement learning (RL) for quantum circuit optimization faces three fundamental bottlenecks: replay buffers that ignore the reliability of temporal-difference (TD) targets, curriculum-based architecture search that triggers a full quantum-classical evaluation at every environment step,…

Speculative Actions: A Lossless Framework for Faster Agentic Systems

arXiv:2510.04371v2 Announce Type: replace Abstract: AI agents are increasingly deployed in complex, interactive environments, yet their runtime remains a major bottleneck for training, evaluation, and real-world use. Typical agent behavior unfolds sequentially, with each action requiring an API call that…