Archives AI News

Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs

arXiv:2512.17131v3 Announce Type: replace Abstract: We propose Generalized Primal Averaging (GPA), an extension of Nesterov’s method that unifies and generalizes recent averaging-based optimizers like single-worker DiLoCo and Schedule-Free, within a non-distributed setting. While DiLoCo relies on a memory-intensive two-loop structure…

Coping with catastrophe

Japan incorporates more disaster planning into its buildings and public spaces than any other nation. Miho Mazereeuw’s new book explains how they do it.

RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis

arXiv:2602.11506v2 Announce Type: replace Abstract: The transition toward localized intelligence through Small Language Models (SLMs) has intensified the need for rigorous performance characterization on resource-constrained edge hardware. However, objectively measuring the theoretical performance ceilings of diverse architectures across heterogeneous platforms…

Deep Learning for Subspace Regression

arXiv:2509.23249v3 Announce Type: replace Abstract: It is often possible to perform reduced order modelling by specifying linear subspace which accurately captures the dynamics of the system. This approach becomes especially appealing when linear subspace explicitly depends on parameters of the…

Uncertainty-aware Language Guidance for Concept Bottleneck Models

arXiv:2602.23495v1 Announce Type: new Abstract: Concept Bottleneck Models (CBMs) provide inherent interpretability by first mapping input samples to high-level semantic concepts, followed by a combination of these concepts for the final classification. However, the annotation of human-understandable concepts requires extensive…

On Minimal Depth in Neural Networks

arXiv:2402.15315v4 Announce Type: replace Abstract: Understanding the relationship between the depth of a neural network and its representational capacity is a central problem in deep learning theory. In this work, we develop a geometric framework to analyze the expressivity of…