Pruning Deep Neural Networks via the Marchenko–Pastur Distribution

2026-06-02 19:00 GMT · 2 days ago aimagpro.com

arXiv:2606.02608v1 Announce Type: new
Abstract: We study a Marchenko–Pastur (MP) random-matrix approach to pruning deep neural networks with very small post-pruning fine-tuning budgets. The main practical contribution is accuracy retention under short calibration and fine-tuning schedules, rather than a long post-pruning reoptimization pipeline. The theory gives deterministic data-path certificates: if the removed component $R$ has small propagated logit effect $L_s | R psi_1(s) |_infty$, pruning decreases an elastic-net objective and preserves samples whose dense margin exceeds twice the perturbation. The zero-budget case gives perfect pruning; a prune–restore extension models weight restoration inside a fixed sparse-execution pattern; and an additive $L_2$-regularized model shows admissible random-like components vanish at the training limit, with persistent spikes stabilizing as the MP bulk collapses. Under iid-Gaussian sufficient conditions, the fitted MP edge $sigma_+$ gives a high-probability layerwise budget signal.
On ImageNet-1k, after only three distillation epochs, ViT-B/16 $2{:}4{+}$ToMe reaches $83.41%$ top-1 ($-1.70$ pp from dense) at $59.81%$ sparse-execution MAC reduction, with $1.388times$ best-observed A40 native-$2{:}4$ backend speedup for the same checkpoint and ToMe graph; a separate no-ToMe A100 endpoint gives $2.705times$. At structured sparsity, ViT-B/16 $6{:}12$ reaches $83.74%$, ViT-L/16 $8{:}16$ dense+permutation reaches $85.33%$ ($-0.51$ pp), and ConvNeXtV2-Base $12{:}16$ reaches $86.35%$ ($-0.37$ pp). For CNNs, ResNet50 $8{:}16$ dense+permutation reaches $75.87%$ ($-0.26$ pp), and ResNet152d CAST-conv+permutation reaches $81.33%$ ($-1.53$ pp) at ${sim}50%$ MAC accounting with a $1.62times$ A40 im2col$+2{:}4$ sparse-GEMM audit.