Archives AI News

A Residual-Aware Theory of Position Bias in Transformers

arXiv:2602.16837v1 Announce Type: new Abstract: Transformer models systematically favor certain token positions, yet the architectural origins of this position bias remain poorly understood. Under causal masking at infinite depth, prior theoretical analyses of attention rollout predict an inevitable collapse of…

Training Large Reasoning Models Efficiently via Progressive Thought Encoding

arXiv:2602.16839v1 Announce Type: new Abstract: Large reasoning models (LRMs) excel on complex problems but face a critical barrier to efficiency: reinforcement learning (RL) training requires long rollouts for outcome-based rewards, where autoregressive decoding dominates time and memory usage. While sliding-window…

Generating Directed Graphs with Dual Attention and Asymmetric Encoding

arXiv:2506.16404v3 Announce Type: replace Abstract: Directed graphs naturally model systems with asymmetric, ordered relationships, essential to applications in biology, transportation, social networks, and visual understanding. Generating such graphs enables tasks such as simulation, data augmentation and novel instance discovery; however,…