Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM
arXiv:2503.07680v3 Announce Type: replace Abstract: Training Long-Context Large Language Models (LLMs) is challenging, as hybrid training with long-context and short-context data often leads to workload imbalances. Existing works mainly use data packing to alleviate this issue, but fail to consider…
