Representation-Level Counterfactual Calibration for Debiased Zero-Shot Recognition
arXiv:2510.26466v2 Announce Type: replace-cross Abstract: Object-context shortcuts remain a persistent challenge in vision-language models, undermining zero-shot reliability when test-time scenes differ from familiar training co-occurrences. We recast this issue as a causal inference problem and ask: Would the prediction remain…
