Archives AI News

The Key to State Reduction in Linear Attention: A Rank-based Perspective

arXiv:2602.04852v2 Announce Type: replace Abstract: Linear attention offers a computationally efficient yet expressive alternative to softmax attention. However, recent empirical results indicate that the hidden state of trained linear attention models often exhibits a low-rank structure, suggesting that these models…

Zero-Sacrifice Persistent-Robustness Adversarial Defense for Pre-Trained Encoders

arXiv:2602.11204v1 Announce Type: new Abstract: The widespread use of publicly available pre-trained encoders from self-supervised learning (SSL) has exposed a critical vulnerability: their susceptibility to downstream-agnostic adversarial examples (DAEs), which are crafted without knowledge of the downstream tasks but capable…

Learning in Structured Stackelberg Games

arXiv:2504.09006v3 Announce Type: replace-cross Abstract: We initiate the study of structured Stackelberg games, a novel form of strategic interaction between a leader and a follower where contextual information can be predictive of the follower’s (unknown) type. Motivated by applications such…

Block-Recurrent Dynamics in Vision Transformers

arXiv:2512.19941v4 Announce Type: replace-cross Abstract: As Vision Transformers (ViTs) become standard vision backbones, a mechanistic account of their computational phenomenology is essential. Despite architectural cues that hint at dynamical structure, there is no settled framework that interprets Transformer depth as…

Decentralized Non-convex Stochastic Optimization with Heterogeneous Variance

arXiv:2602.11789v1 Announce Type: cross Abstract: Decentralized optimization is critical for solving large-scale machine learning problems over distributed networks, where multiple nodes collaborate through local communication. In practice, the variances of stochastic gradient estimators often differ across nodes, yet their impact…

Towards Compressive and Scalable Recurrent Memory

arXiv:2602.11212v1 Announce Type: new Abstract: Transformers face a quadratic bottleneck in attention when scaling to long contexts. Recent approaches introduce recurrent memory to extend context beyond the current window, yet these often face a fundamental trade-off between theoretical principles and…

Benchmarking Vision-Language Models for French PDF-to-Markdown Conversion

arXiv:2602.11960v1 Announce Type: cross Abstract: This report evaluates PDF-to-Markdown conversion using recent Vision-Language Models (VLMs) on challenging French documents. Document parsing is a critical step for Retrieval-Augmented Generation (RAG) pipelines, where transcription and layout errors propagate to downstream retrieval and…

Charting Empirical Laws for LLM Fine-Tuning in Scientific Multi-Discipline Learning

arXiv:2602.11215v1 Announce Type: new Abstract: While large language models (LLMs) have achieved strong performance through fine-tuning within individual scientific domains, their learning dynamics in multi-disciplinary contexts remains poorly understood, despite the promise of improved generalization and broader applicability through cross-domain…