Archives AI News

Reasoning About Intent for Ambiguous Requests

arXiv:2511.10453v1 Announce Type: cross Abstract: Large language models often respond to ambiguous requests by implicitly committing to one interpretation. Intent misunderstandings can frustrate users and create safety risks. To address this, we propose generating multiple interpretation-answer pairs in a single…

November 15, 2025

SPAN: Benchmarking and Improving Cross-Calendar Temporal Reasoning of Large Language Models

arXiv:2511.09993v1 Announce Type: new Abstract: We introduce SPAN, a cross-calendar temporal reasoning benchmark, which requires LLMs to perform intra-calendar temporal reasoning and inter-calendar temporal conversion. SPAN features ten cross-calendar temporal reasoning directions, two reasoning types, and two question formats across…

November 15, 2025

Preview, Accept or Discard? A Predictive Low-Motion Interaction Paradigm

arXiv:2511.10532v1 Announce Type: cross Abstract: Repetitive strain injury (RSI) affects roughly one in five computer users and remains largely unresolved despite decades of ergonomic mouse redesign. All such devices share a fundamental limitation: they still require fine-motor motion to operate.…

November 15, 2025

ChEmREF: Evaluating Language Model Readiness for Chemical Emergency Response

arXiv:2511.10027v1 Announce Type: new Abstract: Emergency responders managing hazardous material HAZMAT incidents face critical, time-sensitive decisions, manually navigating extensive chemical guidelines. We investigate whether today’s language models can assist responders by rapidly and reliably understanding critical information, identifying hazards, and…

November 15, 2025

Towards an Agentic Workflow for Internet Measurement Research

arXiv:2511.10611v1 Announce Type: cross Abstract: Internet measurement research faces an accessibility crisis: complex analyses require custom integration of multiple specialized tools that demands specialized domain expertise. When network disruptions occur, operators need rapid diagnostic workflows spanning infrastructure mapping, routing analysis,…

November 15, 2025

Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning

arXiv:2511.10037v1 Announce Type: new Abstract: Existing tool-augmented large language models (LLMs) encounter significant challenges when processing complex queries. Current frameworks such as ReAct are prone to local optimization traps due to their reliance on incremental decision-making processes. To address these…

November 15, 2025

Unlocking Efficient Vehicle Dynamics Modeling via Analytic World Models

arXiv:2502.10012v2 Announce Type: replace Abstract: Differentiable simulators represent an environment’s dynamics as a differentiable function. Within robotics and autonomous driving, this property is used in Analytic Policy Gradients (APG), which relies on backpropagating through the dynamics to train accurate policies…

November 15, 2025

Efficient Thought Space Exploration through Strategic Intervention

arXiv:2511.10038v1 Announce Type: new Abstract: While large language models (LLMs) demonstrate emerging reasoning capabilities, current inference-time expansion methods incur prohibitive computational costs by exhaustive sampling. Through analyzing decoding trajectories, we observe that most next-token predictions align well with the golden…

November 15, 2025

LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild

arXiv:2510.14240v3 Announce Type: replace Abstract: Deep research — producing comprehensive, citation-grounded reports by searching and synthesizing information from hundreds of live web sources — marks an important frontier for agentic systems. To rigorously evaluate this ability, four principles are essential:…

November 15, 2025

Radiology Workflow-Guided Hierarchical Reinforcement Fine-Tuning for Medical Report Generation

arXiv:2511.10065v1 Announce Type: new Abstract: Radiologists compose diagnostic reports through a structured workflow: they describe visual findings, summarize them into impressions, and carefully refine statements in clinically critical cases. However, most existing medical report generation (MRG) systems treat reports as…

November 15, 2025