Archives AI News

Online Posterior Sampling with a Diffusion Prior

arXiv:2410.03919v2 Announce Type: replace Abstract: Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we…

Preventing Rank Collapse in Federated Low-Rank Adaptation with Client Heterogeneity

arXiv:2602.13486v1 Announce Type: new Abstract: Federated low-rank adaptation (FedLoRA) has facilitated communication-efficient and privacy-preserving fine-tuning of foundation models for downstream tasks. In practical federated learning scenarios, client heterogeneity in system resources and data distributions motivates heterogeneous LoRA ranks across clients.…

Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs

arXiv:2506.13593v5 Announce Type: replace Abstract: We introduce time-to-unsafe-sampling, a novel safety measure for generative models, defined as the number of generations required by a large language model (LLM) to trigger an unsafe (e.g., toxic) response. While providing a new dimension…

TrasMuon: Trust-Region Adaptive Scaling for Orthogonalized Momentum Optimizers

arXiv:2602.13498v1 Announce Type: new Abstract: Muon-style optimizers leverage Newton-Schulz (NS) iterations to orthogonalize updates, yielding update geometries that often outperform Adam-series methods. However, this orthogonalization discards magnitude information, rendering training sensitive to step-size hyperparameters and vulnerable to high-energy bursts. To…

Discrete State Diffusion Models: A Sample Complexity Perspective

arXiv:2510.10854v2 Announce Type: replace Abstract: Diffusion models have demonstrated remarkable performance in generating high-dimensional samples across domains such as vision, language, and the sciences. Although continuous-state diffusion models have been extensively studied both empirically and theoretically, discrete-state diffusion models, essential…

NeuroPareto: Calibrated Acquisition for Costly Many-Goal Search in Vast Parameter Spaces

arXiv:2602.03901v2 Announce Type: replace Abstract: The pursuit of optimal trade-offs in high-dimensional search spaces under stringent computational constraints poses a fundamental challenge for contemporary multi-objective optimization. We develop NeuroPareto, a cohesive architecture that integrates rank-centric filtering, uncertainty disentanglement, and history-conditioned…

Singular Vectors of Attention Heads Align with Features

arXiv:2602.13524v1 Announce Type: new Abstract: Identifying feature representations in language models is a central task in mechanistic interpretability. Several recent studies have made an implicit assumption that feature representations can be inferred in some cases from singular vectors of attention…

Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent

arXiv:2501.01696v2 Announce Type: replace-cross Abstract: Tensors, which give a faithful and effective representation to deliver the intrinsic structure of multi-dimensional data, play a crucial role in an increasing number of signal processing and machine learning problems. However, tensor data are…