Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model
arXiv:2511.20798v1 Announce Type: new Abstract: Recent advances in mechanistic interpretability have revealed that large language models (LLMs) develop internal representations corresponding not only to concrete entities but also distinct, human-understandable abstract concepts and behaviour. Moreover, these hidden features can be…
