Exact Causal Attention with 10% Fewer Operations
arXiv:2510.05175v1 Announce Type: new Abstract: We present Fast Causal Attention (FCA), an algorithm that computes exact Causal Attention using 10% fewer operations. FCA accelerates a special class of matrix multiplications where either one operand or the output matrix is upper-…
