arXiv:2606.11255v2 Announce Type: new
Abstract: Bernstein–Schur kernels are products of a finite-feature kernel and a completely monotone shift-invariant kernel: nonstationary kernels falling between the shift-invariant and dot-product templates random features exploit, so neither Bochner sampling nor polynomial sketching applies to the full kernel directly. We give one random-feature construction for the whole class that randomizes both factors: it sketches the finite modulation and samples the radial factor’s one-dimensional Bernstein–Widder scale before applying Gaussian random Fourier features, giving feature dimension $Dm$, free of the $O(d^2)$ size of the exact modulation feature. With the modulation kept exact (the $mtoinfty$ limit), we prove unbiasedness, an exact variance, and a matrix-Bernstein operator-norm bound controlled by the top kernel and modulation eigenvalues and an intrinsic dimension rather than the crude $Nmax_{ij}$ route. Whitening this argument at the ridge makes the effective dimension $d_{mathrm{eff}}(lambda)$ the emph{exact} intrinsic dimension of the matrix variance, so $O((1+|P|_{mathrm{op}}/lambda)log(d_{mathrm{eff}}/delta))$ radial draws preserve the kernel-ridge solution; tilting the draw by a closed-form whitened leverage improves this to the effective-dimension count $O((1+d_{mathrm{eff}})log(d_{mathrm{eff}}/delta))$. Conditioning on the sketch carries every guarantee to the deployed doubly-randomized estimator up to one additive sketch term, and all hold for the whole class with the modulation Gram in place of the polynomial one. The flagship instance is the biased $yat$-kernel $k_{yat,b}(w,x)=(w^top x+b)^2/(|w-x|^2+varepsilon)$, whose family span contains the inverse-multiquadric kernel by finite differences in $b$.
