Muon+: Towards Better Muon via One Additional Normalization Step
arXiv:2602.21545v2 Announce Type: replace Abstract: The Muon optimizer has demonstrated promising performance in pre-training large language models through gradient (or momentum) orthogonalization. In this work, we propose a simple yet effective enhancement to Muon, namely Muon+, which introduces an additional…
