Archives AI News

$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models

arXiv:2604.06260v1 Announce Type: new Abstract: Test-time scaling investigates whether a fixed diffusion language model (DLM) can generate better outputs when given more inference compute, without additional training. However, naive best-of-$K$ sampling is fundamentally limited because it repeatedly draws from the…

April 10, 2026

EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering

arXiv:2603.11703v2 Announce Type: replace Abstract: We introduce EvoFlows, a variable-length protein sequence-to-sequence modeling approach designed for protein engineering. Existing protein language models are poorly suited for optimization tasks: autoregressive models require full sequence generation, masked language and discrete diffusion models…

April 10, 2026

Low-Rank Key Value Attention

arXiv:2601.11471v3 Announce Type: replace Abstract: The key-value (KV) cache is a primary memory bottleneck in Transformers. We propose Low-Rank Key-Value (LRKV) attention, which reduces KV cache memory by exploiting redundancy across attention heads, while being compute efficient. Each layer uses…

April 10, 2026

Spectral Edge Dynamics Reveal Functional Modes of Learning

arXiv:2604.06256v1 Announce Type: new Abstract: Training dynamics during grokking concentrate along a small number of dominant update directions — the spectral edge — which reliably distinguishes grokking from non-grokking regimes. We show that standard mechanistic interpretability tools (head attribution, activation…

April 10, 2026

Exploring Natural Language-Based Strategies for Efficient Number Learning in Children through Reinforcement Learning

arXiv:2410.08334v2 Announce Type: replace-cross Abstract: In this paper, we build a reinforcement learning framework to study how children compose numbers using base-ten blocks. Studying numerical cognition in toddlers offers a powerful window into the learning process itself, because numbers sit…

April 10, 2026

Bi-Level Optimization for Single Domain Generalization

arXiv:2604.06349v1 Announce Type: new Abstract: Generalizing from a single labeled source domain to unseen target domains, without access to any target data during training, remains a fundamental challenge in robust machine learning. We address this underexplored setting, known as Single…

April 10, 2026

FLeX: Fourier-based Low-rank EXpansion for multilingual transfer

arXiv:2604.06253v1 Announce Type: new Abstract: Cross-lingual code generation is critical in enterprise environments where multiple programming languages coexist. However, fine-tuning large language models (LLMs) individually for each language is computationally prohibitive. This paper investigates whether parameter-efficient fine-tuning methods and optimizer…

April 10, 2026

Algebraic Diversity: Group-Theoretic Spectral Estimation from Single Observations

arXiv:2604.03634v2 Announce Type: replace Abstract: We establish that temporal averaging over multiple observations is the degenerate case of algebraic group action with the trivial group $G={e}$. A General Replacement Theorem proves that a group-averaged estimator from one snapshot achieves equivalent…

April 10, 2026

Asymptotic-Preserving Neural Networks for Viscoelastic Parameter Identification in Multiscale Blood Flow Modeling

arXiv:2604.06287v1 Announce Type: new Abstract: Mathematical models and numerical simulations offer a non-invasive way to explore cardiovascular phenomena, providing access to quantities that cannot be measured directly. In this study, we start with a one-dimensional multiscale blood flow model that…

April 10, 2026

Stochastic Gradient Descent in the Saddle-to-Saddle Regime of Deep Linear Networks

arXiv:2604.06366v1 Announce Type: new Abstract: Deep linear networks (DLNs) are used as an analytically tractable model of the training dynamics of deep neural networks. While gradient descent in DLNs is known to exhibit saddle-to-saddle dynamics, the impact of stochastic gradient…

April 10, 2026