Archives AI News

On the Convergence Analysis of Muon

arXiv:2505.23737v2 Announce Type: replace-cross Abstract: The majority of parameters in neural networks are naturally represented as matrices. However, most commonly used optimizers treat these matrix parameters as flattened vectors during optimization, potentially overlooking their inherent structural properties. Recently, an optimizer…

ResBM: Residual Bottleneck Models for Low-Bandwidth Pipeline Parallelism

arXiv:2604.11947v1 Announce Type: new Abstract: Unlocking large-scale low-bandwidth decentralized training has the potential to utilize otherwise untapped compute resources. In centralized settings, large-scale multi-node training is primarily enabled by data and pipeline parallelism, two techniques that require ultra-high-bandwidth communication. While…

The Linear Centroids Hypothesis: How Deep Network Features Represent Data

arXiv:2604.11962v1 Announce Type: new Abstract: Identifying and understanding the features that a deep network (DN) extracts from its inputs to produce its outputs is a focal point of interpretability research. The Linear Representation Hypothesis (LRH) identifies features in terms of…