Archives AI News

LayerSync: Self-aligning Intermediate Layers

arXiv:2510.12581v1 Announce Type: cross Abstract: We propose LayerSync, a domain-agnostic approach for improving the generation quality and the training efficiency of diffusion models. Prior studies have highlighted the connection between the quality of generation and the representations learned by diffusion…

October 15, 2025

MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics

arXiv:2510.11962v1 Announce Type: new Abstract: Diffusion models are renowned for their generative capabilities, yet their pretraining processes exhibit distinct phases of learning speed that have been entirely overlooked in prior post-training acceleration efforts in the community. In this study, we…

October 15, 2025

Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps

arXiv:2510.12744v1 Announce Type: cross Abstract: We develop a unified statistical framework for softmax-gated Gaussian mixture of experts (SGMoE) that addresses three long-standing obstacles in parameter estimation and model selection: (i) non-identifiability of gating parameters up to common translations, (ii) intrinsic…

October 15, 2025

QLENS: Towards A Quantum Perspective of Language Transformers

arXiv:2510.11963v1 Announce Type: new Abstract: In natural language processing, current methods for understanding Transformers are successful at identifying intermediate predictions during a model’s inference. However, these approaches function as limited diagnostic checkpoints, lacking a mathematical framework for mechanistically modeling how…

October 15, 2025

WW-FL: Secure and Private Large-Scale Federated Learning

arXiv:2302.09904v4 Announce Type: replace Abstract: Federated learning (FL) is an efficient approach for large-scale distributed machine learning that promises data privacy by keeping training data on client devices. However, recent research has uncovered vulnerabilities in FL, impacting both security and…

October 15, 2025

Learning Dynamics of VLM Finetuning

arXiv:2510.11978v1 Announce Type: new Abstract: Preference-based finetuning of vision–language models (VLMs) is brittle: trivially wrong negatives inject uninformative gradients that destabilize training. We recast alignment as textbf{learning-dynamics–aware optimization} and introduce textbf{Cooling-Weighted DPO (CW-DPO)}, a two-stage recipe that explicitly models and…

October 15, 2025

Toward Fair Graph Neural Networks Via Dual-Teacher Knowledge Distillation

arXiv:2412.00382v2 Announce Type: replace Abstract: Graph Neural Networks (GNNs) have demonstrated strong performance in graph representation learning across various real-world applications. However, they often produce biased predictions caused by sensitive attributes, such as religion or gender, an issue that has…

October 15, 2025

Learning by Steering the Neural Dynamics: A Statistical Mechanics Perspective

arXiv:2510.11984v1 Announce Type: new Abstract: Despite the striking successes of deep neural networks trained with gradient-based optimization, these methods differ fundamentally from their biological counterparts. This gap raises key questions about how nature achieves robust, sample-efficient learning at minimal energy…

October 15, 2025

How Vulnerable Is My Learned Policy? Universal Adversarial Perturbation Attacks On Modern Behavior Cloning Policies

arXiv:2502.03698v3 Announce Type: replace Abstract: Learning from Demonstration (LfD) algorithms have shown promising results in robotic manipulation tasks, but their vulnerability to offline universal perturbation attacks remains underexplored. This paper presents a comprehensive study of adversarial attacks on both classic…

October 15, 2025

Nonlinear discretizations and Newton’s method: characterizing stationary points of regression objectives

arXiv:2510.11987v1 Announce Type: new Abstract: Second-order methods are emerging as promising alternatives to standard first-order optimizers such as gradient descent and ADAM for training neural networks. Though the advantages of including curvature information in computing optimization steps have been celebrated…

October 15, 2025