FlexAttention + FlashAttention-4: Fast and Flexible

2026-03-05 08:55 GMT · 2 months ago aimagpro.com

TL;DR: On Hopper and Blackwell GPUs, FlexAttention now has a FlashAttention-4 backend. We added support in PyTorch to automatically generate CuTeDSL score/mask modification functions, and to JIT-instantiate FlashAttention-4 for custom…