Archives AI News

Self-Augmenting Retrieval for Diffusion Language Models

arXiv:2606.06474v1 Announce Type: cross Abstract: Discrete diffusion language models generate text by iteratively denoising an entire response in parallel. At each step, they predict tentative tokens for every masked position, committing the confident predictions to the output and discarding the…

June 6, 2026

LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

arXiv:2606.05400v1 Announce Type: new Abstract: Long-horizon autoformalization of research mathematics fails not only at hard lemmas, but at scale: statements drift, dependencies tangle, context decays, and local repairs corrupt distant work. We present LeanMarathon, a multi-agent harness for reliable research-level…

June 6, 2026

Controllable and Verifiable Process Data Synthesis for Process Reward Models

arXiv:2605.02395v2 Announce Type: replace Abstract: Process reward models (PRMs) rely on high-quality process supervision data, yet existing construction methods often provide limited control over error location, error type, and trajectory consistency. We propose a controllable and verifiable framework for synthesizing…

June 6, 2026

Harnessing Generalist Agents for Contextualized Time Series

arXiv:2606.05404v1 Announce Type: new Abstract: Time series are often embedded in rich contexts that are essential for holistic modeling. Moreover, real-world practitioners often require end-to-end workflows for analyzing temporal dynamics, where widely studied tasks such as forecasting are only one…

June 6, 2026

Agents’ Last Exam

arXiv:2606.05405v1 Announce Type: new Abstract: Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not translated into economically meaningful deployment across many professional domains. We argue that this gap is largely an…

June 6, 2026

Towards the Readability of LLM-Generated Codes through Multitask Representation Engineering

arXiv:2606.06214v1 Announce Type: cross Abstract: Correctness and readability are key measures of code quality, respectively ensuring functional fidelity and ease of comprehension. While most existing research focuses on improving the correctness of large language models~(LLMs) generated codes, readability remains under-addressed.…

June 6, 2026

Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models

arXiv:2606.04672v2 Announce Type: replace-cross Abstract: Continuous-time dynamic graphs (CTDGs) provide a richer framework to capture fine-grained temporal patterns in evolving relational data. Long-range information propagation is a key challenge while learning representations, wherein it is important to retain and update…

June 6, 2026

OPRD: On-Policy Representation Distillation

arXiv:2606.06021v1 Announce Type: cross Abstract: On-policy distillation (OPD) supervises the student only in output space by matching next-token probabilities. This output-only paradigm has two limits: (1) sampling variance from Monte Carlo KL estimates over large vocabularies (e.g., Qwen’s ~150k tokens)…

June 6, 2026

Beyond Rewards in Reinforcement Learning for Cyber Defence

arXiv:2602.04809v3 Announce Type: replace-cross Abstract: Recent years have seen an explosion of interest in autonomous cyber defence agents trained to defend computer networks using deep reinforcement learning. These agents are typically trained in cyber gym environments using dense, highly engineered…

June 6, 2026

Towards the Readability of LLM-Generated Codes through Multitask Representation Engineering

June 6, 2026