Archives AI News

Untangling Component Imbalance in Hybrid Linear Attention Conversion Methods

arXiv:2510.05901v2 Announce Type: replace Abstract: Transformers’ quadratic computational complexity limits their scalability despite remarkable performance. While linear attention reduces this to linear complexity, pre-training such models from scratch remains, in most cases, prohibitively expensive. Recent post-training linearisation methods convert pre-trained…

LOTION: Smoothing the Optimization Landscape for Quantized Training

arXiv:2510.08757v1 Announce Type: new Abstract: Optimizing neural networks for quantized objectives is fundamentally challenging because the quantizer is piece-wise constant, yielding zero gradients everywhere except at quantization thresholds where the derivative is undefined. Most existing methods deal with this issue…

Spatial Deconfounder: Interference-Aware Deconfounding for Spatial Causal Inference

arXiv:2510.08762v1 Announce Type: new Abstract: Causal inference in spatial domains faces two intertwined challenges: (1) unmeasured spatial factors, such as weather, air pollution, or mobility, that confound treatment and outcome, and (2) interference from nearby treatments that violate standard no-interference…

AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models

arXiv:2505.17312v4 Announce Type: replace-cross Abstract: LLMs often need effective configurations, like temperature and reasoning steps, to handle tasks requiring sophisticated reasoning and problem-solving, ranging from joke generation to mathematical reasoning. Existing prompting approaches usually adopt general-purpose, fixed configurations that work…

MorphGen: Controllable and Morphologically Plausible Generative Cell-Imaging

arXiv:2510.01298v2 Announce Type: replace-cross Abstract: Simulating in silico cellular responses to interventions is a promising direction to accelerate high-content image-based assays, critical for advancing drug discovery and gene editing. To support this, we introduce MorphGen, a state-of-the-art diffusion-based generative model…

Struc-EMB: The Potential of Structure-Aware Encoding in Language Embeddings

arXiv:2510.08774v1 Announce Type: new Abstract: Text embeddings from Large Language Models (LLMs) have become foundational for numerous applications. However, these models typically operate on raw text, overlooking the rich structural information, such as hyperlinks or citations, that provides crucial context…