The Key to State Reduction in Linear Attention: A Rank-based Perspective
arXiv:2602.04852v2 Announce Type: replace Abstract: Linear attention offers a computationally efficient yet expressive alternative to softmax attention. However, recent empirical results indicate that the hidden state of trained linear attention models often exhibits a low-rank structure, suggesting that these models…
