Archives AI News

Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs

arXiv:2512.17131v3 Announce Type: replace Abstract: We propose Generalized Primal Averaging (GPA), an extension of Nesterov’s method that unifies and generalizes recent averaging-based optimizers like single-worker DiLoCo and Schedule-Free, within a non-distributed setting. While DiLoCo relies on a memory-intensive two-loop structure…

March 2, 2026

Coping with catastrophe

Japan incorporates more disaster planning into its buildings and public spaces than any other nation. Miho Mazereeuw’s new book explains how they do it.

March 2, 2026

Conformalized Neural Networks for Federated Uncertainty Quantification under Dual Heterogeneity

arXiv:2602.23296v2 Announce Type: replace Abstract: Federated learning (FL) faces challenges in uncertainty quantification (UQ). Without reliable UQ, FL systems risk deploying overconfident models at under-resourced agents, leading to silent local failures despite seemingly satisfactory global performance. Existing federated UQ approaches…

March 2, 2026

pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models

arXiv:2507.05394v3 Announce Type: replace-cross Abstract: Vision-Language Models (VLMs) like CLIP have demonstrated remarkable generalization in zero- and few-shot settings, but adapting them efficiently to decentralized, heterogeneous data remains a challenge. While prompt tuning has emerged as a popular parameter-efficient approach…

March 2, 2026

Heterogeneous Multi-Agent Reinforcement Learning with Attention for Cooperative and Scalable Feature Transformation

arXiv:2511.21934v2 Announce Type: replace Abstract: Feature transformation enhances downstream task performance by generating informative features through mathematical feature crossing. Despite the advancements in deep learning, feature transformation remains essential for structured data, where deep models often struggle to capture complex…

March 2, 2026

RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis

arXiv:2602.11506v2 Announce Type: replace Abstract: The transition toward localized intelligence through Small Language Models (SLMs) has intensified the need for rigorous performance characterization on resource-constrained edge hardware. However, objectively measuring the theoretical performance ceilings of diverse architectures across heterogeneous platforms…

March 2, 2026

Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search

arXiv:2505.11601v3 Announce Type: replace Abstract: Feature selection removes redundant features to enhanc performance and computational efficiency in downstream tasks. Existing works often struggle to capture complex feature interactions and adapt to diverse scenarios. Recent advances in this domain have incorporated…

March 2, 2026

Deep Learning for Subspace Regression

arXiv:2509.23249v3 Announce Type: replace Abstract: It is often possible to perform reduced order modelling by specifying linear subspace which accurately captures the dynamics of the system. This approach becomes especially appealing when linear subspace explicitly depends on parameters of the…

March 2, 2026

Uncertainty-aware Language Guidance for Concept Bottleneck Models

arXiv:2602.23495v1 Announce Type: new Abstract: Concept Bottleneck Models (CBMs) provide inherent interpretability by first mapping input samples to high-level semantic concepts, followed by a combination of these concepts for the final classification. However, the annotation of human-understandable concepts requires extensive…

March 2, 2026

On Minimal Depth in Neural Networks

arXiv:2402.15315v4 Announce Type: replace Abstract: Understanding the relationship between the depth of a neural network and its representational capacity is a central problem in deep learning theory. In this work, we develop a geometric framework to analyze the expressivity of…

March 2, 2026