Orthogonalized Policy Optimization:Policy Optimization as Orthogonal Projection in Hilbert Space
arXiv:2601.12415v5 Announce Type: replace Abstract: We propose Orthogonalized Policy Optimization (OPO), a principled framework for large language model alignment derived from optimization in the Hilbert function space L2(pi_k). Lifting policy updates from the probability simplex into L2(pi_k) transforms the nonlinear…
