Archives AI News

Entropy After for reasoning model early exiting

arXiv:2509.26522v3 Announce Type: replace Abstract: Reasoning LLMs show improved performance with longer chains of thought. However, recent work has highlighted their tendency to overthink, continuing to revise answers even after reaching the correct solution. We quantitatively confirm this inefficiency from…

April 10, 2026

BiScale-GTR: Fragment-Aware Graph Transformers for Multi-Scale Molecular Representation Learning

arXiv:2604.06336v1 Announce Type: new Abstract: Graph Transformers have recently attracted attention for molecular property prediction by combining the inductive biases of graph neural networks (GNNs) with the global receptive field of Transformers. However, many existing hybrid architectures remain GNN-dominated, causing…

April 10, 2026

Low-Rank Key Value Attention

arXiv:2601.11471v3 Announce Type: replace Abstract: The key-value (KV) cache is a primary memory bottleneck in Transformers. We propose Low-Rank Key-Value (LRKV) attention, which reduces KV cache memory by exploiting redundancy across attention heads, while being compute efficient. Each layer uses…

April 10, 2026

Bi-Level Optimization for Single Domain Generalization

arXiv:2604.06349v1 Announce Type: new Abstract: Generalizing from a single labeled source domain to unseen target domains, without access to any target data during training, remains a fundamental challenge in robust machine learning. We address this underexplored setting, known as Single…

April 10, 2026

Algebraic Diversity: Group-Theoretic Spectral Estimation from Single Observations

arXiv:2604.03634v2 Announce Type: replace Abstract: We establish that temporal averaging over multiple observations is the degenerate case of algebraic group action with the trivial group $G={e}$. A General Replacement Theorem proves that a group-averaged estimator from one snapshot achieves equivalent…

April 10, 2026

Stochastic Gradient Descent in the Saddle-to-Saddle Regime of Deep Linear Networks

arXiv:2604.06366v1 Announce Type: new Abstract: Deep linear networks (DLNs) are used as an analytically tractable model of the training dynamics of deep neural networks. While gradient descent in DLNs is known to exhibit saddle-to-saddle dynamics, the impact of stochastic gradient…

April 10, 2026

A Giant-Step Baby-Step Classifier For Scalable and Real-Time Anomaly Detection In Industrial Control Systems and Water Treatment Systems

arXiv:2504.20906v4 Announce Type: replace-cross Abstract: The continuous monitoring of the interactions between cyber-physical components of any industrial control system (ICS) is required to secure automation of the system controls, and to guarantee plant processes are fail-safe and remain in an…

April 10, 2026

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

arXiv:2604.06377v1 Announce Type: new Abstract: We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions…

April 10, 2026

Probabilistic Predictions of Process-Induced Deformation in Carbon/Epoxy Composites Using a Deep Operator Network

arXiv:2512.13746v4 Announce Type: replace-cross Abstract: Fiber reinforcement and polymer matrix respond differently to manufacturing conditions due to mismatch in coefficient of thermal expansion and matrix shrinkage during curing of thermosets. These heterogeneities generate residual stresses over multiple length scales, whose…

April 10, 2026

Toward a universal foundation model for graph-structured data

arXiv:2604.06391v1 Announce Type: new Abstract: Graphs are a central representation in biomedical research, capturing molecular interaction networks, gene regulatory circuits, cell–cell communication maps, and knowledge graphs. Despite their importance, currently there is not a broadly reusable foundation model available for…

April 10, 2026