TawPipe: Topology-Aware Weight Pipeline Parallelism for Accelerating Long-Context Large Models Training
arXiv:2511.09741v1 Announce Type: new Abstract: Training large language models (LLMs) is fundamentally constrained by limited device memory and costly inter-device communication. Although pipeline parallelism alleviates memory pressure by partitioning models across devices, it incurs activation communication overhead that scales linearly…
