MuLoCo: Muon is a practical inner optimizer for DiLoCo
arXiv:2505.23725v2 Announce Type: replace Abstract: DiLoCo is a powerful framework for training large language models (LLMs), enabling larger optimal batch sizes and increased accelerator utilization under networking constraints. However, DiLoCo’s performance has been shown to degrade as the number of…
