A Theoretical Framework for Grokking: Interpolation followed by Riemannian Norm Minimisation
arXiv:2505.20172v2 Announce Type: replace Abstract: We study the dynamics of gradient flow with small weight decay on general training losses $F: mathbb{R}^d to mathbb{R}$. Under mild regularity assumptions and assuming convergence of the unregularised gradient flow, we show that the…
