Archives AI News

SentinelBench: A Benchmark for Long-Running Monitoring Agents

arXiv:2606.05342v1 Announce Type: new Abstract: AI agents are increasingly asked to carry out work that spans minutes, hours, or longer. Yet the default model of agent behavior is continuous action: issuing tool calls, refreshing pages, searching for alternatives, or otherwise…

June 6, 2026

Beyond Rewards in Reinforcement Learning for Cyber Defence

arXiv:2602.04809v3 Announce Type: replace-cross Abstract: Recent years have seen an explosion of interest in autonomous cyber defence agents trained to defend computer networks using deep reinforcement learning. These agents are typically trained in cyber gym environments using dense, highly engineered…

June 6, 2026

OPRD: On-Policy Representation Distillation

arXiv:2606.06021v1 Announce Type: cross Abstract: On-policy distillation (OPD) supervises the student only in output space by matching next-token probabilities. This output-only paradigm has two limits: (1) sampling variance from Monte Carlo KL estimates over large vocabularies (e.g., Qwen’s ~150k tokens)…

June 6, 2026

Towards the Readability of LLM-Generated Codes through Multitask Representation Engineering

arXiv:2606.06214v1 Announce Type: cross Abstract: Correctness and readability are key measures of code quality, respectively ensuring functional fidelity and ease of comprehension. While most existing research focuses on improving the correctness of large language models~(LLMs) generated codes, readability remains under-addressed.…

June 6, 2026

Binary Gaussian Copula Synthesis: an LLM-powered data augmentation framework for early dialysis prediction in chronic kidney disease

arXiv:2403.00965v2 Announce Type: replace-cross Abstract: Only a small fraction of patients with chronic kidney disease (CKD) progress to dialysis, creating severe class imbalance that limits the performance of machine learning models for early dialysis prediction. This challenge is compounded by…

June 6, 2026

An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)

arXiv:2606.05357v1 Announce Type: new Abstract: Purpose: To develop an interpretable and trustworthy AI framework that combines deep learning based MRI Osteoarthritis Knee Score (MOAKS) prediction with interpretable statistical modeling to study structure-pain relationships at scale using data from the Osteoarthritis…

June 6, 2026

Query-efficient model evaluation using cached responses

arXiv:2605.07096v2 Announce Type: replace-cross Abstract: Evaluating a new model on an existing benchmark is often necessary to understand its behavior before deployment. For modern evaluation frameworks, generating and evaluating a response for all queries can be prohibitively expensive. In practice,…

June 6, 2026

Beyond Rewards in Reinforcement Learning for Cyber Defence

June 6, 2026

Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models

arXiv:2606.04672v2 Announce Type: replace-cross Abstract: Continuous-time dynamic graphs (CTDGs) provide a richer framework to capture fine-grained temporal patterns in evolving relational data. Long-range information propagation is a key challenge while learning representations, wherein it is important to retain and update…

June 6, 2026

Synthetic Contrastive Reasoning for Multi-Table Q&A

arXiv:2606.05382v1 Announce Type: new Abstract: Multi-table question answering requires models to retrieve relevant evidence, link schemas, and perform compositional reasoning across relational tables. Existing multi-table Q&A resources typically provide questions and final answers but lack reasoning supervision that explains how…

June 6, 2026