Semi-Supervised Preference Optimization with Limited Feedback
arXiv:2511.00040v1 Announce Type: new Abstract: The field of preference optimization has made outstanding contributions to the alignment of language models with human preferences. Despite these advancements, recent methods still rely heavily on substantial paired (labeled) feedback data, leading to substantial…
