arXiv:2502.07620v3 Announce Type: replace
Abstract: The remarkable success of large-scale contrastive pre-training has been largely driven by by vast yet static datasets. However, as the scaling paradigm evolves, this paradigm encounters a fundamental challenge when applied to dynamic data streams characterized by concept drift – unpredictable changes in the underlying data distribution. This paper aims to advance robust pre-training under such non-stationary environments. We begin by revealing that conventional contrastive pre-training methods are highly susceptible to concept drift, resulting in significant substantial bias and instability within the learned feature representations. To systematically analyze these effects, we develop a structural causal model that elucidates how drift acts as a confounder, distorting the learned representations. Based on these causal insights, we propose Resilient Contrastive Pre-training (RCP), a novel method that incorporates causal intervention. RCP formulates a causally-informed objective to mitigate drift-induced biases through targeted interventions. The method is designed for simple and scalable implementation and exhibits notable adaptability, promoting robust and autonomous pre-training on non-stationary data. Comprehensive experiments across various downstream tasks consistently demonstrate that RCP effectively alleviates the detrimental impact of concept drift, yielding more resilient and generalizable representations.
