Archives AI News

FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-World LoRA

arXiv:2503.11880v3 Announce Type: replace Abstract: Fine-tuning large language models (LLMs) in federated settings enables privacy-preserving adaptation but suffers from cross-client interference due to model aggregation. Existing federated LoRA fine-tuning methods, primarily based on FedAvg, struggle with data heterogeneity, leading to…

Convergence Bound and Critical Batch Size of Muon Optimizer

arXiv:2507.01598v3 Announce Type: replace Abstract: Muon, a recently proposed optimizer that leverages the inherent matrix structure of neural network parameters, has demonstrated strong empirical performance, indicating its potential as a successor to standard optimizers such as AdamW. This paper presents…

Transformers know more than they can tell — Learning the Collatz sequence

arXiv:2511.10811v1 Announce Type: new Abstract: We investigate transformer prediction of long Collatz steps, a complex arithmetic function that maps odd integers to their distant successors in the Collatz sequence ( $u_{n+1}=u_n/2$ if $u_n$ is even, $u_{n+1}=(3u_n+1)/2$ if $u_n$ is odd).…

Towards Uncertainty Quantification in Generative Model Learning

arXiv:2511.10710v1 Announce Type: new Abstract: While generative models have become increasingly prevalent across various domains, fundamental concerns regarding their reliability persist. A crucial yet understudied aspect of these models is the uncertainty quantification surrounding their distribution approximation capabilities. Current evaluation…

Bias-Restrained Prefix Representation Finetuning for Mathematical Reasoning

arXiv:2511.10707v1 Announce Type: new Abstract: Parameter-Efficient finetuning (PEFT) enhances model performance on downstream tasks by updating a minimal subset of parameters. Representation finetuning (ReFT) methods further improve efficiency by freezing model weights and optimizing internal representations with fewer parameters than…

Differentiable Sparse Identification of Lagrangian Dynamics

arXiv:2511.10706v1 Announce Type: new Abstract: Data-driven discovery of governing equations from data remains a fundamental challenge in nonlinear dynamics. Although sparse regression techniques have advanced system identification, they struggle with rational functions and noise sensitivity in complex mechanical systems. The…