The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
arXiv:2507.08802v2 Announce Type: replace Abstract: The concept of causal abstraction got recently popularised to demystify the opaque decision-making processes of machine learning models; in short, a neural network can be abstracted as a higher-level algorithm if there exists a function…
