The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure
arXiv:2602.18523v1 Announce Type: new Abstract: Grokking — the abrupt transition from memorization to generalization long after near-zero training loss — has been studied mainly in single-task settings. We extend geometric analysis to multi-task modular arithmetic, training shared-trunk Transformers on dual-task…
