Archives AI News

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

arXiv:2606.06096v1 Announce Type: cross Abstract: Policy-gradient methods usually optimize expected return, but many real world applications care about distributional properties of returns: tail risk, outlier robustness, or best-of-K discovery. We introduce OrderGrad, a family of likelihood-ratio and reparameterization gradient estimators…

June 6, 2026

Towards the Readability of LLM-Generated Codes through Multitask Representation Engineering

arXiv:2606.06214v1 Announce Type: cross Abstract: Correctness and readability are key measures of code quality, respectively ensuring functional fidelity and ease of comprehension. While most existing research focuses on improving the correctness of large language models~(LLMs) generated codes, readability remains under-addressed.…

June 6, 2026

Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models

arXiv:2606.04672v2 Announce Type: replace-cross Abstract: Continuous-time dynamic graphs (CTDGs) provide a richer framework to capture fine-grained temporal patterns in evolving relational data. Long-range information propagation is a key challenge while learning representations, wherein it is important to retain and update…

June 6, 2026

OPRD: On-Policy Representation Distillation

arXiv:2606.06021v1 Announce Type: cross Abstract: On-policy distillation (OPD) supervises the student only in output space by matching next-token probabilities. This output-only paradigm has two limits: (1) sampling variance from Monte Carlo KL estimates over large vocabularies (e.g., Qwen’s ~150k tokens)…

June 6, 2026

Beyond Rewards in Reinforcement Learning for Cyber Defence

arXiv:2602.04809v3 Announce Type: replace-cross Abstract: Recent years have seen an explosion of interest in autonomous cyber defence agents trained to defend computer networks using deep reinforcement learning. These agents are typically trained in cyber gym environments using dense, highly engineered…

June 6, 2026

Query-efficient model evaluation using cached responses

arXiv:2605.07096v2 Announce Type: replace-cross Abstract: Evaluating a new model on an existing benchmark is often necessary to understand its behavior before deployment. For modern evaluation frameworks, generating and evaluating a response for all queries can be prohibitively expensive. In practice,…

June 6, 2026

Learning Adaptive Parallel Execution for Efficient Code Localization

arXiv:2601.19568v2 Announce Type: replace Abstract: Code localization constitutes a key bottleneck in automated software development pipelines. While concurrent tool execution can enhance discovery speed, current agents demonstrate a 34.9% redundant invocation rate, which negates parallelism benefits. We propose FuseSearch, reformulating…

June 6, 2026

Towards the Readability of LLM-Generated Codes through Multitask Representation Engineering

June 6, 2026

Binary Gaussian Copula Synthesis: an LLM-powered data augmentation framework for early dialysis prediction in chronic kidney disease

arXiv:2403.00965v2 Announce Type: replace-cross Abstract: Only a small fraction of patients with chronic kidney disease (CKD) progress to dialysis, creating severe class imbalance that limits the performance of machine learning models for early dialysis prediction. This challenge is compounded by…

June 6, 2026

Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models

June 6, 2026