Improved Bounds for Private and Robust Alignment
arXiv:2512.23816v1 Announce Type: new Abstract: In this paper, we study the private and robust alignment of language models from a theoretical perspective by establishing upper bounds on the suboptimality gap in both offline and online settings. We consider preference labels…
