Early alignment in two-layer networks training is a two-edged sword
arXiv:2401.10791v3 Announce Type: replace-cross Abstract: Training neural networks with first order optimisation methods is at the core of the empirical success of deep learning. The scale of initialisation is a crucial factor, as small initialisations are generally associated to a…
