Generating State-of-the-Art GEMMs with TorchInductor’s CuteDSL backend

2026-04-06 22:00 GMT · 2 months ago aimagpro.com

Introduction TorchInductor currently supports three autotuning backends for matrix multiplications: Triton, CUTLASS (C++), and cuBLAS. This post describes the integration of CuteDSL as a fourth backend, the technical motivation for…