pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models
arXiv:2507.05394v3 Announce Type: replace-cross Abstract: Vision-Language Models (VLMs) like CLIP have demonstrated remarkable generalization in zero- and few-shot settings, but adapting them efficiently to decentralized, heterogeneous data remains a challenge. While prompt tuning has emerged as a popular parameter-efficient approach…
