Towards Efficient Large Vision-Language Models: A Comprehensive Survey on Inference Strategies
arXiv:2603.27960v2 Announce Type: replace Abstract: Although Large Vision Language Models (LVLMs) have demonstrated impressive multimodal reasoning capabilities, their scalability and deployment are constrained by massive computational requirements. In particular, the massive amount of visual tokens from high-resolution input data aggravates…
