ADPO: Anchored Direct Preference Optimization
arXiv:2510.18913v1 Announce Type: new Abstract: Anchored Direct Preference Optimization (ADPO) is a unified framework that generalizes Direct Preference Optimization (DPO) with soft preferences, reference-policy anchoring, and groupwise extensions. While standard DPO assumes hard binary labels and pairwise comparisons, ADPO introduces:…
