Pruning the Paradox: How CLIP’s Most Informative Heads Enhance Performance While Amplifying Bias
arXiv:2503.11103v3 Announce Type: replace-cross Abstract: CLIP is one of the most popular foundation models and is heavily used for many vision-language tasks, yet little is known about its inner workings. As CLIP is increasingly deployed in real-world applications, it is…
