Sink-Token-Aware Pruning for Fine-Grained Video Understanding in Efficient Video LLMs
arXiv:2604.20937v1 Announce Type: new Abstract: Video Large Language Models (Video LLMs) incur high inference latency due to a large number of visual tokens provided to LLMs. To address this, training-free visual token pruning has emerged as a solution to reduce…
