K-frames: Scene-Driven Any-k Keyframe Selection for long video understanding
arXiv:2510.13891v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have demonstrated significant capabilities in image understanding, but long-video are constrained by context windows and computational cost. Uniform frame sampling often leads to substantial information loss. Meanwhile existing keyframe selection…
