Archives AI News

Differentially Private Model Merging

arXiv:2604.20985v1 Announce Type: new Abstract: In machine learning applications, privacy requirements during inference or deployment time could change constantly due to varying policies, regulations, or user experience. In this work, we aim to generate a magnitude of models to satisfy…

HyperAdapt: Simple High-Rank Adaptation

arXiv:2509.18629v3 Announce Type: replace Abstract: Foundation models excel across diverse tasks, but adapting them to specialized applications often requires fine-tuning, an approach that is memory and compute-intensive. Parameter-efficient fine-tuning (PEFT) methods mitigate this by updating only a small subset of…

Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse

arXiv:2511.00413v5 Announce Type: replace Abstract: Agentic large language model (LLM) training often involves multi-turn interaction trajectories that branch into multiple execution paths due to concurrent tool use, think-mode, sub-agent, context management and other runtime designs. As a result, the tokens…

SGD at the Edge of Stability: The Stochastic Sharpness Gap

arXiv:2604.21016v1 Announce Type: new Abstract: When training neural networks with full-batch gradient descent (GD) and step size $eta$, the largest eigenvalue of the Hessian — the sharpness $S(boldsymbol{theta})$ — rises to $2/eta$ and hovers there, a phenomenon termed the Edge…

BackPlay: Head-Only Look-Back Self-Correction for Diffusion Language Models

arXiv:2601.06428v3 Announce Type: replace Abstract: Diffusion Language Models (DLMs) decode multiple tokens in parallel, but aggressive multi-token decoding amplifies cross-token dependency errors and can sharply degrade generation quality. We propose BackPlay, a frozen-backbone self-correction framework that trains only a lightweight…

MCAP: Deployment-Time Layer Profiling for Memory-Constrained LLM Inference

arXiv:2604.21026v1 Announce Type: new Abstract: Deploying large language models to heterogeneous hardware is often constrained by memory, not compute. We introduce MCAP (Monte Carlo Activation Profiling), a load-time per-layer importance estimator that enables dynamic precision and memory placement decisions on…

Continuous-Utility Direct Preference Optimization

arXiv:2602.00931v2 Announce Type: replace Abstract: Large language model reasoning is often treated as a monolithic capability, relying on binary preference supervision that fails to capture partial progress or fine-grained reasoning quality. We introduce Continuous Utility Direct Preference Optimization (CU-DPO), a…

BioTrain: Sub-MB, Sub-50mW On-Device Fine-Tuning for Edge-AI on Biosignals

arXiv:2604.13359v2 Announce Type: replace Abstract: Biosignals exhibit substantial cross-subject and cross-session variability, inducing severe domain shifts that degrade post-deployment performance for small, edge-oriented AI models. On-device adaptation is therefore essential to both preserve user privacy and ensure system reliability. However,…