Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression
arXiv:2603.02217v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models scale capacity efficiently, but their massive parameter footprint creates a deployment-time memory bottleneck. We organize retraining-free MoE compression into three paradigms – Expert Pruning, Expert Editing, and Expert Merging – and show…
