SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention
arXiv:2509.24006v2 Announce Type: replace Abstract: In Diffusion Transformer (DiT) models, particularly for video generation, attention latency is a major bottleneck due to the long sequence length and the quadratic complexity. We find that attention weights can be separated into two…
