Representation Collapse in Machine Translation Through the Lens of Angular Dispersion
arXiv:2602.17287v1 Announce Type: cross Abstract: Modern neural translation models based on the Transformer architecture are known for their high performance, particularly when trained on high-resource datasets. A standard next-token prediction training strategy, while widely adopted in practice, may lead to…
