SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
arXiv:2505.21893v2 Announce Type: replace-cross Abstract: Preference learning has become a central technique for aligning generative models with human expectations. Recently, it has been extended to diffusion models through methods like Direct Preference Optimization (DPO). However, existing approaches such as Diffusion-DPO…
