Archives AI News

Can MLLMs “Read” What is Missing?

arXiv:2604.21277v1 Announce Type: new Abstract: We introduce MMTR-Bench, a benchmark designed to evaluate the intrinsic ability of Multimodal Large Language Models (MLLMs) to reconstruct masked text directly from visual context. Unlike conventional question-answering tasks, MMTR-Bench eliminates explicit prompts, requiring models…

April 25, 2026

Building a Precise Video Language with Human-AI Oversight

arXiv:2604.21718v1 Announce Type: cross Abstract: Video-language models (VLMs) learn to reason about the dynamic visual world through natural language. We introduce a suite of open datasets, benchmarks, and recipes for scalable oversight that enable precise video captioning. First, we define…

April 25, 2026

Spatial Metaphors for LLM Memory: A Critical Analysis of the MemPalace Architecture

arXiv:2604.21284v1 Announce Type: new Abstract: MemPalace is an open-source AI memory system that applies the ancient method of loci (memory palace) spatial metaphor to organize long-term memory for large language models; launched in April 2026, it accumulated over 47,000 GitHub…

April 25, 2026

Divide-then-Diagnose: Weaving Clinician-Inspired Contexts for Ultra-Long Capsule Endoscopy Videos

arXiv:2604.21814v1 Announce Type: cross Abstract: Capsule endoscopy (CE) enables non-invasive gastrointestinal screening, but current CE research remains largely limited to frame-level classification and detection, leaving video-level analysis underexplored. To bridge this gap, we introduce and formally define a new task,…

April 25, 2026

Ideological Bias in LLMs’ Economic Causal Reasoning

arXiv:2604.21334v1 Announce Type: new Abstract: Do large language models (LLMs) exhibit systematic ideological bias when reasoning about economic causal effects? As LLMs are increasingly used in policy analysis and economic reporting, where directionally correct causal judgments are essential, this question…

April 25, 2026

A Multi-Stage Warm-Start Deep Learning Framework for Unit Commitment

arXiv:2604.21891v1 Announce Type: cross Abstract: Maintaining instantaneous balance between electricity supply and demand is critical for reliability and grid instability. System operators achieve this through solving the task of Unit Commitment (UC),ca high dimensional large-scale Mixed-integer Linear Programming (MILP) problem…

April 25, 2026

ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams

arXiv:2604.15994v2 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) excel at recognizing individual visual elements and reasoning over simple linear diagrams. However, when faced with complex topological structures involving branching paths, converging flows, and cyclic dependencies, their reasoning capabilities…

April 25, 2026

Counterfactual Segmentation Reasoning: Diagnosing and Mitigating Pixel-Grounding Hallucination

arXiv:2506.21546v4 Announce Type: replace-cross Abstract: Segmentation Vision-Language Models (VLMs) have significantly advanced grounded visual understanding, yet they remain prone to pixel-grounding hallucinations, producing masks for incorrect objects or for objects that are entirely absent. Existing evaluations rely almost entirely on…

April 25, 2026

Replay-buffer engineering for noise-robust quantum circuit optimization

arXiv:2604.21863v1 Announce Type: cross Abstract: Deep reinforcement learning (RL) for quantum circuit optimization faces three fundamental bottlenecks: replay buffers that ignore the reliability of temporal-difference (TD) targets, curriculum-based architecture search that triggers a full quantum-classical evaluation at every environment step,…

April 25, 2026

Speculative Actions: A Lossless Framework for Faster Agentic Systems

arXiv:2510.04371v2 Announce Type: replace Abstract: AI agents are increasingly deployed in complex, interactive environments, yet their runtime remains a major bottleneck for training, evaluation, and real-world use. Typical agent behavior unfolds sequentially, with each action requiring an API call that…

April 25, 2026