Archives AI News

MOCHA: Multi-modal Objects-aware Cross-arcHitecture Alignment

arXiv:2509.14001v1 Announce Type: cross Abstract: We introduce MOCHA (Multi-modal Objects-aware Cross-arcHitecture Alignment), a knowledge distillation approach that transfers region-level multimodal semantics from a large vision-language teacher (e.g., LLaVa) into a lightweight vision-only object detector student (e.g., YOLO). A translation module…