The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM
arXiv:2510.01650v2 Announce Type: replace Abstract: Neural network pruning is a promising technique to mitigate the excessive computational and memory requirements of large language models (LLMs). Despite its promise, however, progress in this area has diminished, as conventional methods are seemingly…
