Achieving Logarithmic Regret in KL-Regularized Zero-Sum Markov Games
arXiv:2510.13060v1 Announce Type: new Abstract: Reverse Kullback-Leibler (KL) divergence-based regularization with respect to a fixed reference policy is widely used in modern reinforcement learning to preserve the desired traits of the reference policy and sometimes to promote exploration (using uniform…
