Beyond the Ideal: Analyzing the Inexact Muon Update
arXiv:2510.19933v1 Announce Type: new Abstract: The Muon optimizer has rapidly emerged as a powerful, geometry-aware alternative to AdamW, demonstrating strong performance in large-scale training of neural networks. However, a critical theory-practice disconnect exists: Muon’s efficiency relies on fast, approximate orthogonalization,…
