The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold
arXiv:2511.01938v1 Announce Type: new Abstract: Grokking is a puzzling phenomenon in neural networks where full generalization occurs only after a substantial delay following the complete memorization of the training data. Previous research has linked this delayed generalization to representation learning…
