Archives AI News

Generalization Below the Edge of Stability: The Role of Data Geometry

arXiv:2510.18120v3 Announce Type: replace-cross Abstract: Understanding generalization in overparameterized neural networks hinges on the interplay between the data geometry, neural architecture, and training dynamics. In this paper, we theoretically explore how data geometry controls this implicit bias. This paper presents…

Attribution-Guided Continual Learning for Large Language Models

arXiv:2605.05285v1 Announce Type: new Abstract: Large language models (LLMs) often suffer from catastrophic forgetting in continual learning: after learning new tasks sequentially, they perform worse on earlier tasks. Existing methods mitigate catastrophic forgetting by data replay, parameter freezing, or regularization.…

Graph Normalization: Fast Binarizing Dynamics for Differentiable MWIS

arXiv:2605.05330v1 Announce Type: new Abstract: We introduce Graph Normalization (GN), a principled dynamical system on graphs that serves as a differentiable approximation engine for the NP-hard Maximum Weight Independent Set (MWIS) problem. MWIS encompasses many combinatorial challenges, including optimal assignment,…

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

arXiv:2605.05892v1 Announce Type: cross Abstract: Activation steering has emerged as a promising alternative for controlling language-model behavior at inference time by modifying intermediate representations while keeping model parameters frozen. However, large-scale evaluations such as AxBench show that existing steering methods…

Feature Starvation as Geometric Instability in Sparse Autoencoders

arXiv:2605.05341v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) are used to disentangle the dense, polysemantic internal representations of large language models (LLMs) into interpretable, monosemantic concepts. However, standard $ell_1$-regularized SAEs suffer from feature starvation (dead neurons) and shrinkage bias, often…

Learning Discrete Autoregressive Priors with Wasserstein Gradient Flow

arXiv:2605.06148v1 Announce Type: cross Abstract: Discrete image tokenizers are commonly trained in two stages: first for reconstruction, and then with a prior model fitted to the frozen token sequences. This decoupling leaves the tokenizer unaware of the model that will…

A Multi-Head Attention Approach for SLA Compliance Monitoring in Data Centers

arXiv:2605.05354v1 Announce Type: new Abstract: Service level agreements (SLAs) in data center colocation contracts define precise thresholds for power, temperature, and humidity, with tiered violation penalties expressed as credits against monthly recurring charges. Traditional reactive monitoring detects breaches only after…