Decomposition of Small Transformer Models
arXiv:2511.08854v2 Announce Type: replace Abstract: Recent work in mechanistic interpretability has shown that decomposing models in parameter space may yield clean handles for analysis and intervention. Previous methods have demonstrated successful applications on a wide range of toy models, but…
