Archives AI News

Untangling Component Imbalance in Hybrid Linear Attention Conversion Methods

arXiv:2510.05901v2 Announce Type: replace Abstract: Transformers’ quadratic computational complexity limits their scalability despite remarkable performance. While linear attention reduces this to linear complexity, pre-training such models from scratch remains, in most cases, prohibitively expensive. Recent post-training linearisation methods convert pre-trained…

October 13, 2025

LOTION: Smoothing the Optimization Landscape for Quantized Training

arXiv:2510.08757v1 Announce Type: new Abstract: Optimizing neural networks for quantized objectives is fundamentally challenging because the quantizer is piece-wise constant, yielding zero gradients everywhere except at quantization thresholds where the derivative is undefined. Most existing methods deal with this issue…

October 13, 2025

Mini-batch Estimation for Deep Cox Models: Statistical Foundations and Practical Guidance

arXiv:2408.02839v2 Announce Type: replace-cross Abstract: The stochastic gradient descent (SGD) algorithm has been widely used to optimize deep Cox neural network (Cox-NN) by updating model parameters using mini-batches of data. We show that SGD aims to optimize the average of…

October 13, 2025

Spatial Deconfounder: Interference-Aware Deconfounding for Spatial Causal Inference

arXiv:2510.08762v1 Announce Type: new Abstract: Causal inference in spatial domains faces two intertwined challenges: (1) unmeasured spatial factors, such as weather, air pollution, or mobility, that confound treatment and outcome, and (2) interference from nearby treatments that violate standard no-interference…

October 13, 2025

AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models

arXiv:2505.17312v4 Announce Type: replace-cross Abstract: LLMs often need effective configurations, like temperature and reasoning steps, to handle tasks requiring sophisticated reasoning and problem-solving, ranging from joke generation to mathematical reasoning. Existing prompting approaches usually adopt general-purpose, fixed configurations that work…

October 13, 2025

Reinforcement Learning-Based Optimization of CT Acquisition and Reconstruction Parameters Through Virtual Imaging Trials

arXiv:2510.08763v1 Announce Type: new Abstract: Protocol optimization is critical in Computed Tomography (CT) to achieve high diagnostic image quality while minimizing radiation dose. However, due to the complex interdependencies among CT acquisition and reconstruction parameters, traditional optimization methods rely on…

October 13, 2025

MorphGen: Controllable and Morphologically Plausible Generative Cell-Imaging

arXiv:2510.01298v2 Announce Type: replace-cross Abstract: Simulating in silico cellular responses to interventions is a promising direction to accelerate high-content image-based assays, critical for advancing drug discovery and gene editing. To support this, we introduce MorphGen, a state-of-the-art diffusion-based generative model…

October 13, 2025

Zero-Shot Policy Transfer in Reinforcement Learning using Buckingham’s Pi Theorem

arXiv:2510.08768v1 Announce Type: new Abstract: Reinforcement learning (RL) policies often fail to generalize to new robots, tasks, or environments with different physical parameters, a challenge that limits their real-world applicability. This paper presents a simple, zero-shot transfer method based on…

October 13, 2025

Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models

arXiv:2510.09259v1 Announce Type: cross Abstract: Data contamination poses a significant threat to the reliable evaluation of Large Language Models (LLMs). This issue arises when benchmark samples may inadvertently appear in training sets, compromising the validity of reported performance. While detection…

October 13, 2025

Struc-EMB: The Potential of Structure-Aware Encoding in Language Embeddings

arXiv:2510.08774v1 Announce Type: new Abstract: Text embeddings from Large Language Models (LLMs) have become foundational for numerous applications. However, these models typically operate on raw text, overlooking the rich structural information, such as hyperlinks or citations, that provides crucial context…

October 13, 2025