Archives AI News

SentinelBench: A Benchmark for Long-Running Monitoring Agents

arXiv:2606.05342v1 Announce Type: new Abstract: AI agents are increasingly asked to carry out work that spans minutes, hours, or longer. Yet the default model of agent behavior is continuous action: issuing tool calls, refreshing pages, searching for alternatives, or otherwise…

Learning Adaptive Parallel Execution for Efficient Code Localization

arXiv:2601.19568v2 Announce Type: replace Abstract: Code localization constitutes a key bottleneck in automated software development pipelines. While concurrent tool execution can enhance discovery speed, current agents demonstrate a 34.9% redundant invocation rate, which negates parallelism benefits. We propose FuseSearch, reformulating…

OPRD: On-Policy Representation Distillation

arXiv:2606.06021v1 Announce Type: cross Abstract: On-policy distillation (OPD) supervises the student only in output space by matching next-token probabilities. This output-only paradigm has two limits: (1) sampling variance from Monte Carlo KL estimates over large vocabularies (e.g., Qwen’s ~150k tokens)…

Beyond Rewards in Reinforcement Learning for Cyber Defence

arXiv:2602.04809v3 Announce Type: replace-cross Abstract: Recent years have seen an explosion of interest in autonomous cyber defence agents trained to defend computer networks using deep reinforcement learning. These agents are typically trained in cyber gym environments using dense, highly engineered…

Synthetic Contrastive Reasoning for Multi-Table Q&A

arXiv:2606.05382v1 Announce Type: new Abstract: Multi-table question answering requires models to retrieve relevant evidence, link schemas, and perform compositional reasoning across relational tables. Existing multi-table Q&A resources typically provide questions and final answers but lack reasoning supervision that explains how…