Archives AI News

Inefficiencies of Meta Agents for Agent Design

arXiv:2510.06711v1 Announce Type: new Abstract: Recent works began to automate the design of agentic systems using meta-agents that propose and iteratively refine new agent architectures. In this paper, we examine three key challenges in a common class of meta-agents. First,…

Verifying Memoryless Sequential Decision-making of Large Language Models

arXiv:2510.06756v1 Announce Type: new Abstract: We introduce a tool for rigorous and automated verification of large language model (LLM)- based policies in memoryless sequential decision-making tasks. Given a Markov decision process (MDP) representing the sequential decision-making task, an LLM policy,…

Evolving and Executing Research Plans via Double-Loop Multi-Agent Collaboration

arXiv:2510.06761v1 Announce Type: new Abstract: Automating the end-to-end scientific research process poses a fundamental challenge: it requires both evolving high-level plans that are novel and sound, and executing these plans correctly amidst dynamic and uncertain conditions. To address this bilevel…

Valid Inference with Imperfect Synthetic Data

arXiv:2508.06635v2 Announce Type: replace-cross Abstract: Predictions and generations from large language models are increasingly being explored as an aid in limited data regimes, such as in computational social science and human subjects research. While prior technical work has mainly explored…

Autoformalizer with Tool Feedback

arXiv:2510.06857v1 Announce Type: new Abstract: Autoformalization addresses the scarcity of data for Automated Theorem Proving (ATP) by translating mathematical problems from natural language into formal statements. Efforts in recent work shift from directly prompting large language models to training an…

TGPR: Tree-Guided Policy Refinement for Robust Self-Debugging of LLMs

arXiv:2510.06878v1 Announce Type: new Abstract: Iterative refinement has been a promising paradigm to enable large language models (LLMs) to resolve difficult reasoning and problem-solving tasks. One of the key challenges, however, is how to effectively search through the enormous search…