arXiv:2509.00083v1 Announce Type: new Abstract: Modern generative models risk overfitting and unintentionally memorizing rare training examples, which can be extracted by adversaries or inflate benchmark performance. We propose Generative Data Cartography (GenDataCarto), a data-centric framework that assigns each pretraining sample a difficulty score (early-epoch loss) and a memorization score (frequency of “forget events''), then partitions examples into four quadrants to guide targeted pruning and up-/down-weighting. We prove that our memorization score lower-bounds classical influence under smoothness assumptions and that down-weighting high-memorization hotspots provably decreases the generalization gap via uniform stability bounds. Empirically, GenDataCarto reduces synthetic canary extraction success by over 40% at just 10% data pruning, while increasing validation perplexity by less than 0.5%. These results demonstrate that principled data interventions can dramatically mitigate leakage with minimal cost to generative performance.
Original: https://arxiv.org/abs/2509.00083
