Clapping: Removing Per-sample Storage for Pipeline Parallel Distributed Optimization with Communication Compression
arXiv:2509.19029v1 Announce Type: cross Abstract: Pipeline-parallel distributed optimization is essential for large-scale machine learning but is challenged by significant communication overhead from transmitting high-dimensional activations and gradients between workers. Existing approaches often depend on impractical unbiased gradient assumptions or incur…
