Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping Level
arXiv:2507.23512v2 Announce Type: replace Abstract: Gradient clipping is a fundamental tool in Deep Learning, improving the high-probability convergence of stochastic first-order methods like SGD, AdaGrad, and Adam under heavy-tailed noise, which is common in training large language models. It is…
