Archives AI News

How to Allocate, How to Learn? Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization

arXiv:2602.19208v2 Announce Type: replace Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for Large Language Model (LLM) reasoning, yet current methods face key challenges in resource allocation and policy optimization dynamics: (i) uniform rollout allocation ignores gradient variance…

April 24, 2026

SCM: Sleep-Consolidated Memory with Algorithmic Forgetting for Large Language Models

arXiv:2604.20943v1 Announce Type: new Abstract: We present SCM (Sleep-Consolidated Memory), a research preview of a memory architecture for large language models that draws on neuroscientific principles to address a fundamental limitation in current systems: the absence of persistent, structured, and…

April 24, 2026

IRIS: Interpolative R’enyi Iterative Self-play for Large Language Model Fine-Tuning

arXiv:2604.20933v1 Announce Type: new Abstract: Self-play fine-tuning enables large language models to improve beyond supervised fine-tuning without additional human annotations by contrasting annotated responses with self-generated ones. Many existing methods rely on a fixed divergence regime. SPIN is closely related…

April 24, 2026

Neural surrogates for crystal growth dynamics with variable supersaturation: explicit vs. implicit conditioning

arXiv:2604.21753v1 Announce Type: cross Abstract: Simulations of crystal growth are performed by using Convolutional Recurrent Neural Network surrogate models, trained on a dataset of time sequences computed by numerical integration of Allen-Cahn dynamics including faceting via kinetic anisotropy. Two network…

April 24, 2026

ICNN-enhanced 2SP: Leveraging input convex neural networks for solving two-stage stochastic programming

arXiv:2505.05261v3 Announce Type: replace-cross Abstract: Two-stage stochastic programming (2SP) offers a basic framework for modelling decision-making under uncertainty, yet scalability remains a challenge due to the computational complexity of recourse function evaluation. Existing learning-based methods like Neural Two-Stage Stochastic Programming…

April 24, 2026

LAF-Based Evaluation and UTTL-Based Learning Strategies with MIATTs

arXiv:2604.20944v1 Announce Type: new Abstract: In many real-world machine learning (ML) applications, the true target cannot be precisely defined due to ambiguity or subjectivity information. To address this challenge, under the assumption that the true target for a given ML…

April 24, 2026

Data-Driven Open-Loop Simulation for Digital-Twin Operator Decision Support in Wastewater Treatment

arXiv:2604.20935v1 Announce Type: new Abstract: Wastewater treatment plants (WWTPs) need digital-twin-style decision support tools that can simulate plant response under prescribed control plans, tolerate irregular and missing sensing, and remain informative over 12-36 h planning horizons. Meeting these requirements with…

April 24, 2026

Revealing Geography-Driven Signals in Zone-Level Claim Frequency Models: An Empirical Study using Environmental and Visual Predictors

arXiv:2604.21893v1 Announce Type: cross Abstract: Geographic context is often consider relevant to motor insurance risk, yet public actuarial datasets provide limited location identifiers, constraining how this information can be incorporated and evaluated in claim-frequency models. This study examines how geographic…

April 24, 2026

Mitigating Lost in Multi-turn Conversation via Curriculum RL with Verifiable Accuracy and Abstention Rewards

arXiv:2510.18731v2 Announce Type: replace-cross Abstract: Large Language Models demonstrate strong capabilities in single-turn instruction following but suffer from Lost-in-Conversation (LiC), a degradation in performance as information is revealed progressively in multi-turn settings. Motivated by the current progress on Reinforcement Learning…

April 24, 2026

Early Detection of Latent Microstructure Regimes in Limit Order Books

arXiv:2604.20949v1 Announce Type: new Abstract: Limit order books can transition rapidly from stable to stressed conditions, yet standard early-warning signals such as order flow imbalance and short-term volatility are inherently reactive. We formalise this limitation via a three-regime causal data-generating…

April 24, 2026