Archives AI News

Constrained Adaptive Rejection Sampling

arXiv:2510.01902v1 Announce Type: cross Abstract: Language Models (LMs) are increasingly used in applications where generated outputs must satisfy strict semantic or syntactic constraints. Existing approaches to constrained generation fall along a spectrum: greedy constrained decoding methods enforce validity during decoding…

Flatness-Aware Stochastic Gradient Langevin Dynamics

arXiv:2510.02174v1 Announce Type: cross Abstract: Generalization in deep learning is closely tied to the pursuit of flat minima in the loss landscape, yet classical Stochastic Gradient Langevin Dynamics (SGLD) offers no mechanism to bias its dynamics toward such low-curvature solutions.…

Optimal Denoising in Score-Based Generative Models: The Role of Data Regularity

arXiv:2503.12966v2 Announce Type: replace-cross Abstract: Score-based generative models achieve state-of-the-art sampling performance by denoising a distribution perturbed by Gaussian noise. In this paper, we focus on a single deterministic denoising step, and compare the optimal denoiser for the quadratic loss,…

AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs

arXiv:2407.20177v5 Announce Type: replace-cross Abstract: Domain reweighting is an emerging research area aimed at adjusting the relative weights of different data sources to improve the effectiveness and efficiency of LLM pre-training. We show that data mixtures that perform well at…

Convergence analysis of online algorithms for vector-valued kernel regression

arXiv:2309.07779v5 Announce Type: replace Abstract: We consider the problem of approximating the regression function $f_mu:, Omega to Y$ from noisy $mu$-distributed vector-valued data $(omega_m,y_m)inOmegatimes Y$ by an online learning algorithm using a reproducing kernel Hilbert space $H$ (RKHS) as prior.…

Golden Ratio Weighting Prevents Model Collapse

arXiv:2502.18049v3 Announce Type: replace Abstract: Recent studies identified an intriguing phenomenon in recursive generative model training known as model collapse, where models trained on data generated by previous models exhibit severe performance degradation. Addressing this issue and developing more effective…

Uniform-in-time convergence bounds for Persistent Contrastive Divergence Algorithms

arXiv:2510.01944v1 Announce Type: new Abstract: We propose a continuous-time formulation of persistent contrastive divergence (PCD) for maximum likelihood estimation (MLE) of unnormalised densities. Our approach expresses PCD as a coupled, multiscale system of stochastic differential equations (SDEs), which perform optimisation…

Adaptive Kernel Selection for Stein Variational Gradient Descent

arXiv:2510.02067v1 Announce Type: new Abstract: A central challenge in Bayesian inference is efficiently approximating posterior distributions. Stein Variational Gradient Descent (SVGD) is a popular variational inference method which transports a set of particles to approximate a target distribution. The SVGD…