Anatomy of Capability Emergence: Scale-Invariant Representation Collapse and Top-Down Reorganization in Neural Networks

2026-02-24 20:00 GMT · 4 months ago aimagpro.com

arXiv:2602.15997v3 Announce Type: replace
Abstract: Capability emergence during neural network training remains mechanistically opaque. We track five geometric measures across five model scales (405K–85M parameters), 120 task$times$level$times$ model combinations (119 achieving accuracy-based emergence) across eight algorithmic tasks, and three Pythia language models (160M–2.8B). We find: (1) training begins with a universal representation collapse to task-specific floors that are scale-invariant across a 210$times$ parameter range (e.g., modular arithmetic collapses to RANKME $,approx,$2.0 regardless of model size); (2) collapse propagates top-down through layers (28/32 task$ times $model consistency), contradicting bottom-up feature-building intuition; (3) a geometric hierarchy in which representation geometry leads emergence (100% precursor rate for hard tasks across all model sizes), while the local learning coefficient is synchronous (0/24 precursor) and Hessian measures lag. We also delineate prediction limits: geometric measures encode coarse task difficulty but not fine-grained timing (within-class concordance ranges from 52% for easy tasks to 69% for hard tasks; when task ordering reverses across scales, prediction fails at 26%). On Pythia, global geometric patterns replicate but per-task precursor signals do not, as the precursor relationship requires task–training alignment that naturalistic pre-training does not provide. Our contribution is the geometric anatomy of emergence and its boundary conditions, not a prediction tool.