Archives AI News

Decoupling Positional and Symbolic Attention Behavior in Transformers

arXiv:2511.11579v1 Announce Type: new Abstract: An important aspect subtending language understanding and production is the ability to independently encode positional and symbolic information of the words within a sentence. In Transformers, positional information is typically encoded using Positional Encodings (PEs).…

Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

arXiv:2410.01623v3 Announce Type: replace Abstract: Low-rank training has emerged as a promising approach for reducing memory usage in training Large Language Models (LLMs). Previous methods either rely on decomposing weight matrices (e.g., LoRA), or seek to decompose gradient matrices (e.g.,…