Archives AI News

The Effect of Attention Head Count on Transformer Approximation

arXiv:2510.06662v1 Announce Type: cross Abstract: Transformer has become the dominant architecture for sequence modeling, yet a detailed understanding of how its structural parameters influence expressive power remains limited. In this work, we study the approximation properties of transformers, with particular…

Non-Asymptotic Analysis of Efficiency in Conformalized Regression

arXiv:2510.07093v1 Announce Type: cross Abstract: Conformal prediction provides prediction sets with coverage guarantees. The informativeness of conformal prediction depends on its efficiency, typically quantified by the expected size of the prediction set. Prior work on the efficiency of conformalized regression…

jmstate, a Flexible Python Package for Multi-State Joint Modeling

arXiv:2510.07128v1 Announce Type: cross Abstract: Classical joint modeling approaches often rely on competing risks or recurrent event formulations to account for complex real-world processes involving evolving longitudinal markers and discrete event occurrences. However, these frameworks typically capture only limited aspects…

On residual network depth

arXiv:2510.03470v2 Announce Type: replace-cross Abstract: Deep residual architectures, such as ResNet and the Transformer, have enabled models of unprecedented depth, yet a formal understanding of why depth is so effective remains an open question. A popular intuition, following Veit et…

Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning

arXiv:2510.04072v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) has become central to enhancing reasoning in large language models (LLMs). Yet on-policy algorithms such as Group Relative Policy Optimization (GRPO) often suffer in early training: noisy gradients from low-quality rollouts lead…