The Affine Divergence: Aligning Activation Updates Beyond Normalisation
arXiv:2512.22247v1 Announce Type: new Abstract: A systematic mismatch exists between mathematically ideal and effective activation updates during gradient descent. As intended, parameters update in their direction of steepest descent. However, activations are argued to constitute a more directly impactful quantity…
