Masking Teacher and Reinforcing Student for Distilling Vision-Language Models
arXiv:2512.22238v1 Announce Type: new Abstract: Large-scale vision-language models (VLMs) have recently achieved remarkable multimodal understanding, but their massive size makes them impractical for deployment on mobile or edge devices. This raises the need for compact yet capable VLMs that can…
