Rethinking Layer Redundancy in Large Language Models: Calibration Objectives and Search for Depth Pruning
arXiv:2604.24938v1 Announce Type: new Abstract: Depth pruning improves the inference efficiency of large language models by removing Transformer blocks. Prior work has focused on importance criteria and search algorithms, often treating layer redundancy as an inherent structural property of pretrained…
