Can DPO Learn Diverse Human Values? A Theoretical Scaling Law
arXiv:2408.03459v5 Announce Type: replace Abstract: Large language models (LLMs) have demonstrated remarkable capabilities but often struggle to align with human preferences, leading to harmful or undesirable outputs. Preference learning, which trains models to distinguish between preferred and non-preferred responses based…
