Archives AI News

ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration

arXiv:2509.21823v1 Announce Type: new Abstract: Reward is critical to the evaluation and training of large language models (LLMs). However, existing rule-based or model-based reward methods struggle to generalize to GUI agents, where access to ground-truth trajectories or application databases is…

September 29, 2025

Positional Encoding via Token-Aware Phase Attention

arXiv:2509.12635v2 Announce Type: replace-cross Abstract: We prove under practical assumptions that Rotary Positional Embedding (RoPE) introduces an intrinsic distance-dependent bias in attention scores that limits RoPE’s ability to model long-context. RoPE extension methods may alleviate this issue, but they typically…

September 29, 2025

DS-STAR: Data Science Agent via Iterative Planning and Verification

arXiv:2509.21825v1 Announce Type: new Abstract: Data science, which transforms raw data into actionable insights, is critical for data-driven decision-making. However, these tasks are often complex, involving steps for exploring multiple data sources and synthesizing findings to deliver insightful answers. While…

September 29, 2025

Explaining multimodal LLMs via intra-modal token interactions

arXiv:2509.22415v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable success across diverse vision-language tasks, yet their internal decision-making mechanisms remain insufficiently understood. Existing interpretability research has primarily focused on cross-modal attribution, identifying which image regions the…

September 29, 2025

Axiomatic Choice and the Decision-Evaluation Paradox

arXiv:2509.21836v1 Announce Type: new Abstract: We introduce a framework for modeling decisions with axioms that are statements about decisions, e.g., ethical constraints. Using our framework we define a taxonomy of decision axioms based on their structural properties and demonstrate a…

September 29, 2025

InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models

arXiv:2509.22536v1 Announce Type: cross Abstract: The immense computational cost of training Large Language Models (LLMs) presents a major barrier to innovation. While FP8 training offers a promising solution with significant theoretical efficiency gains, its widespread adoption has been hindered by…

September 29, 2025

DeepTravel: An End-to-End Agentic Reinforcement Learning Framework for Autonomous Travel Planning Agents

arXiv:2509.21842v1 Announce Type: new Abstract: Travel planning (TP) agent has recently worked as an emerging building block to interact with external tools and resources for travel itinerary generation, ensuring enjoyable user experience. Despite its benefits, existing studies rely on hand…

September 29, 2025

Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity

arXiv:2509.22641v1 Announce Type: cross Abstract: N-gram novelty is widely used to evaluate language models’ ability to generate text outside of their training data. More recently, it has also been adopted as a metric for measuring textual creativity. However, theoretical work…

September 29, 2025

Reimagining Agent-based Modeling with Large Language Model Agents via Shachi

arXiv:2509.21862v1 Announce Type: new Abstract: The study of emergent behaviors in large language model (LLM)-driven multi-agent systems is a critical research challenge, yet progress is limited by a lack of principled methodologies for controlled experimentation. To address this, we introduce…

September 29, 2025

XBOUND: Exploring Capability Boundaries of Device-Control Agents at the State Level

arXiv:2505.21279v2 Announce Type: replace Abstract: Recent advancements in vision-language models have increased interest in Device-Control Agents (DC agents) for managing graphical user interfaces (GUIs). With the growing complexity and integration of such agents into various applications, effective evaluation methods have…

September 29, 2025