Archives AI News

Byzantine-Robust Optimization under $(L_0, L_1)$-Smoothness

arXiv:2603.12512v1 Announce Type: new Abstract: We consider distributed optimization under Byzantine attacks in the presence of $(L_0,L_1)$-smoothness, a generalization of standard $L$-smoothness that captures functions with state-dependent gradient Lipschitz constants. We propose Byz-NSGDM, a normalized stochastic gradient descent method with…

March 16, 2026

Rethinking Attention: Polynomial Alternatives to Softmax in Transformers

arXiv:2410.18613v3 Announce Type: replace Abstract: This paper questions whether the strong performance of softmax attention in transformers stems from producing a probability distribution over inputs. Instead, we argue that softmax’s effectiveness lies in its implicit regularization of the Frobenius norm…

March 16, 2026

Learning Pore-scale Multiphase Flow from 4D Velocimetry

arXiv:2603.12516v1 Announce Type: new Abstract: Multiphase flow in porous media underpins subsurface energy and environmental technologies, including geological CO$_2$ storage and underground hydrogen storage, yet pore-scale dynamics in realistic three-dimensional materials remain difficult to characterize and predict. Here we introduce…

March 16, 2026

Accelerating Residual Reinforcement Learning with Uncertainty Estimation

arXiv:2506.17564v2 Announce Type: replace Abstract: Residual Reinforcement Learning (RL) is a popular approach for adapting pretrained policies by learning a lightweight residual policy that provides corrective actions. While Residual RL is more sample-efficient than finetuning the entire base policy, existing…

March 16, 2026

Curriculum Sampling: A Two-Phase Curriculum for Efficient Training of Flow Matching

arXiv:2603.12517v1 Announce Type: new Abstract: Timestep sampling $p(t)$ is a central design choice in Flow Matching models, yet common practice increasingly favors static middle-biased distributions (e.g., Logit-Normal). We show that this choice induces a speed–quality trade-off: middle-biased sampling accelerates early…

March 16, 2026

PreLoRA: Hybrid Pre-training of Vision Transformers with Full Training and Low-Rank Adapters

arXiv:2509.21619v2 Announce Type: replace Abstract: Training large models ranging from millions to billions of parameters is highly resource-intensive, requiring significant time, compute, and memory. It is observed that most of the learning (higher change in weights) takes place in the…

March 16, 2026

When LLM Judge Scores Look Good but Best-of-N Decisions Fail

arXiv:2603.12520v1 Announce Type: new Abstract: Large language models are often used as judges to score candidate responses, then validated with a single global metric such as correlation with reference labels. This can be misleading when the real deployment task is…

March 16, 2026

Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression

arXiv:2511.13421v2 Announce Type: replace Abstract: While data scaling laws of large language models (LLMs) have been widely examined in the one-pass regime with massive corpora, their form under limited data and repeated epochs remains largely unexplored. This paper presents a…

March 16, 2026

TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning

arXiv:2603.12529v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) achieve impressive performance on complex reasoning tasks via Chain-of-Thought (CoT) reasoning, which enables them to generate intermediate thinking tokens before arriving at the final answer. However, LRMs often suffer from significant…

March 16, 2026

New sensor sniffs out pneumonia on a patient’s breath

The technology could enable fast, point-of-care diagnoses for pneumonia and other lung conditions.

March 16, 2026