Self-Attention as Distributional Projection: A Unified Interpretation of Transformer Architecture
arXiv:2511.13780v1 Announce Type: new Abstract: This paper presents a mathematical interpretation of self-attention by connecting it to distributional semantics principles. We show that self-attention emerges from projecting corpus-level co-occurrence statistics into sequence context. Starting from the co-occurrence matrix underlying GloVe…
