Accelerating 2K scale pre-training up to 1.28x with TorchAO, MXFP8 and TorchTitan on Crusoe B200 Cluster

tldr: 1.22x – 1.28x training acceleration with MXFP8, equivalent convergence compared to BF16. We recently worked with a Crusoe B200 cluster with 1856 GPUs, giving us a first look at...

2025-09-03 15:30 GMT · 7 months ago pytorch.org

tldr: 1.22x – 1.28x training acceleration with MXFP8, equivalent convergence compared to BF16. We recently worked with a Crusoe B200 cluster with 1856 GPUs, giving us a first look at…

Original: https://pytorch.org/blog/accelerating-2k-scale-pre-training-up-to-1-28x-with-torchao-mxfp8-and-torchtitan-on-crusoe-b200-cluster/