Archives AI News

Low-Rank Key Value Attention

arXiv:2601.11471v3 Announce Type: replace Abstract: The key-value (KV) cache is a primary memory bottleneck in Transformers. We propose Low-Rank Key-Value (LRKV) attention, which reduces KV cache memory by exploiting redundancy across attention heads, while being compute efficient. Each layer uses…

April 10, 2026

Bi-Level Optimization for Single Domain Generalization

arXiv:2604.06349v1 Announce Type: new Abstract: Generalizing from a single labeled source domain to unseen target domains, without access to any target data during training, remains a fundamental challenge in robust machine learning. We address this underexplored setting, known as Single…

April 10, 2026

Algebraic Diversity: Group-Theoretic Spectral Estimation from Single Observations

arXiv:2604.03634v2 Announce Type: replace Abstract: We establish that temporal averaging over multiple observations is the degenerate case of algebraic group action with the trivial group $G={e}$. A General Replacement Theorem proves that a group-averaged estimator from one snapshot achieves equivalent…

April 10, 2026

Stochastic Gradient Descent in the Saddle-to-Saddle Regime of Deep Linear Networks

arXiv:2604.06366v1 Announce Type: new Abstract: Deep linear networks (DLNs) are used as an analytically tractable model of the training dynamics of deep neural networks. While gradient descent in DLNs is known to exhibit saddle-to-saddle dynamics, the impact of stochastic gradient…

April 10, 2026

A Giant-Step Baby-Step Classifier For Scalable and Real-Time Anomaly Detection In Industrial Control Systems and Water Treatment Systems

arXiv:2504.20906v4 Announce Type: replace-cross Abstract: The continuous monitoring of the interactions between cyber-physical components of any industrial control system (ICS) is required to secure automation of the system controls, and to guarantee plant processes are fail-safe and remain in an…

April 10, 2026

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

arXiv:2604.06377v1 Announce Type: new Abstract: We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions…

April 10, 2026

EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering

arXiv:2603.11703v2 Announce Type: replace Abstract: We introduce EvoFlows, a variable-length protein sequence-to-sequence modeling approach designed for protein engineering. Existing protein language models are poorly suited for optimization tasks: autoregressive models require full sequence generation, masked language and discrete diffusion models…

April 10, 2026

Exploring Natural Language-Based Strategies for Efficient Number Learning in Children through Reinforcement Learning

arXiv:2410.08334v2 Announce Type: replace-cross Abstract: In this paper, we build a reinforcement learning framework to study how children compose numbers using base-ten blocks. Studying numerical cognition in toddlers offers a powerful window into the learning process itself, because numbers sit…

April 10, 2026

Inference-Time Scaling of Diffusion Language Models via Trajectory Refinement

arXiv:2507.08390v4 Announce Type: replace Abstract: Discrete diffusion models have recently emerged as strong alternatives to autoregressive language models, matching their performance through large-scale training. However, inference-time control remains relatively underexplored. In this work, we study how to steer generation toward…

April 10, 2026

Tensor-Efficient High-Dimensional Q-learning

arXiv:2511.03595v2 Announce Type: replace Abstract: High-dimensional reinforcement learning(RL) faces challenges with complex calculations and low sample efficiency in large state-action spaces. Q-learning algorithms struggle particularly with the curse of dimensionality, where the number of state-action pairs grows exponentially with problem…

April 10, 2026