Dimensional Criticality at Grokking Across MLPs and Transformers
arXiv:2604.16431v1 Announce Type: new Abstract: Abrupt transitions between distinct dynamical regimes are a hallmark of complex systems. Grokking in deep neural networks provides a striking example — an abrupt transition from memorization to generalization long after training accuracy saturates —…
