Eigenvalue distribution of the Neural Tangent Kernel in the quadratic scaling

arXiv:2508.20036v1 Announce Type: cross Abstract: We compute the asymptotic eigenvalue distribution of the neural tangent kernel of a two-layer neural network under a specific scaling of dimension. Namely, if $Xinmathbb{R}^{ntimes d}$ is an i.i.d random matrix, $Winmathbb{R}^{dtimes p}$ is an i.i.d $mathcal{N}(0,1)$ matrix and $Dinmathbb{R}^{ptimes p}$ is a diagonal matrix with i.i.d bounded entries, we consider the matrix [ mathrm{NTK} = frac{1}{d}XX^top odot frac{1}{p} sigma'left( frac{1}{sqrt{d}}XW right)D^2 sigma'left( frac{1}{sqrt{d}}XW right)^top ] where $sigma'$ is a pseudo-Lipschitz function applied entrywise and under the scaling $frac{n}{dp}to gamma_1$ and $frac{p}{d}to gamma_2$. We describe the asymptotic distribution as the free multiplicative convolution of the Marchenko--Pastur distribution with a deterministic distribution depending on $sigma$ and $D$.

August 28, 2025

2025-08-28 14:09 GMT · 5 days ago arxiv.org

arXiv:2508.20036v1 Announce Type: cross Abstract: We compute the asymptotic eigenvalue distribution of the neural tangent kernel of a two-layer neural network under a specific scaling of dimension. Namely, if $Xinmathbb{R}^{ntimes d}$ is an i.i.d random matrix, $Winmathbb{R}^{dtimes p}$ is an i.i.d $mathcal{N}(0,1)$ matrix and $Dinmathbb{R}^{ptimes p}$ is a diagonal matrix with i.i.d bounded entries, we consider the matrix [ mathrm{NTK} = frac{1}{d}XX^top odot frac{1}{p} sigma'left( frac{1}{sqrt{d}}XW right)D^2 sigma'left( frac{1}{sqrt{d}}XW right)^top ] where $sigma'$ is a pseudo-Lipschitz function applied entrywise and under the scaling $frac{n}{dp}to gamma_1$ and $frac{p}{d}to gamma_2$. We describe the asymptotic distribution as the free multiplicative convolution of the Marchenko–Pastur distribution with a deterministic distribution depending on $sigma$ and $D$.

Original: https://arxiv.org/abs/2508.20036