Archives AI News

The Initialization Determines Whether In-Context Learning Is Gradient Descent

arXiv:2512.04268v1 Announce Type: new Abstract: In-context learning (ICL) in large language models (LLMs) is a striking phenomenon, yet its underlying mechanisms remain only partially understood. Previous work connects linear self-attention (LSA) to gradient descent (GD), this connection has primarily been…

Random Feature Spiking Neural Networks

arXiv:2510.01012v2 Announce Type: replace Abstract: Spiking Neural Networks (SNNs) as Machine Learning (ML) models have recently received a lot of attention as a potentially more energy-efficient alternative to conventional Artificial Neural Networks. The non-differentiability and sparsity of the spiking mechanism…

When do spectral gradient updates help in deep learning?

arXiv:2512.04299v1 Announce Type: new Abstract: Spectral gradient methods, such as the recently popularized Muon optimizer, are a promising alternative to standard Euclidean gradient descent for training deep neural networks and transformers, but it is still unclear in which regimes they…

Triangle Multiplication Is All You Need For Biomolecular Structure Representations

arXiv:2510.18870v2 Announce Type: replace-cross Abstract: AlphaFold has transformed protein structure prediction, but emerging applications such as virtual ligand screening, proteome-wide folding, and de novo binder design demand predictions at a massive scale, where runtime and memory costs become prohibitive. A…

Evaluating Long-Context Reasoning in LLM-Based WebAgents

arXiv:2512.04307v1 Announce Type: new Abstract: As large language model (LLM)-based agents become increasingly integrated into daily digital interactions, their ability to reason across long interaction histories becomes crucial for providing personalized and contextually aware assistance. However, the performance of these…

Arbitrage: Efficient Reasoning via Advantage-Aware Speculation

arXiv:2512.05033v1 Announce Type: cross Abstract: Modern Large Language Models achieve impressive reasoning capabilities with long Chain of Thoughts, but they incur substantial computational cost during inference, and this motivates techniques to improve the performance-cost ratio. Among these techniques, Speculative Decoding…