Archives AI News

How Transformers Learn to Plan via Multi-Token Prediction

arXiv:2604.11912v1 Announce Type: new Abstract: While next-token prediction (NTP) has been the standard objective for training language models, it often struggles to capture global structure in reasoning tasks. Multi-token prediction (MTP) has recently emerged as a promising alternative, yet its…

Can AI Detect Life? Lessons from Artificial Life

arXiv:2604.11915v1 Announce Type: new Abstract: Modern machine learning methods have been proposed to detect life in extraterrestrial samples, drawing on their ability to distinguish biotic from abiotic samples based on training models using natural and synthetic organic molecular mixtures. Here…

Robust Federated Inference

arXiv:2510.00310v3 Announce Type: replace Abstract: Federated inference, in the form of one-shot federated learning, edge ensembles, or federated ensembles, has emerged as an attractive solution to combine predictions from multiple models. This paradigm enables each model to remain local and…

ResBM: Residual Bottleneck Models for Low-Bandwidth Pipeline Parallelism

arXiv:2604.11947v1 Announce Type: new Abstract: Unlocking large-scale low-bandwidth decentralized training has the potential to utilize otherwise untapped compute resources. In centralized settings, large-scale multi-node training is primarily enabled by data and pipeline parallelism, two techniques that require ultra-high-bandwidth communication. While…

The Linear Centroids Hypothesis: How Deep Network Features Represent Data

arXiv:2604.11962v1 Announce Type: new Abstract: Identifying and understanding the features that a deep network (DN) extracts from its inputs to produce its outputs is a focal point of interpretability research. The Linear Representation Hypothesis (LRH) identifies features in terms of…