Archives AI News

The Serial Scaling Hypothesis

arXiv:2507.12549v4 Announce Type: replace Abstract: While machine learning has advanced through massive parallelization, we identify a critical blind spot: some problems are fundamentally sequential. These “inherently serial” problems-from mathematical reasoning to physical simulations to sequential decision-making-require sequentially dependent computational steps…

The hidden risks of temporal resampling in clinical reinforcement learning

arXiv:2602.06603v3 Announce Type: replace Abstract: Reinforcement learning (RL) is a type of artificial intelligence for making optimal choices. In healthcare, researchers generally use offline RL (ORL), where models are trained and evaluated from retrospective observational data. To accommodate inherently irregular…

Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL

arXiv:2603.19470v2 Announce Type: replace Abstract: Off-policy problems such as policy staleness and training–inference mismatch have become a major bottleneck for training stability and further exploration in LLM RL. The distribution gap between the inference and updated policies grows because of…

Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

arXiv:2604.26326v1 Announce Type: new Abstract: Reinforcement learning (RL) has unlocked complex reasoning abilities in large language models (LLMs). However, most RL algorithms suffer from performance saturation, preventing further gains as RL training scales. This problem can be characterized by the…