Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm

2025-12-10 20:00 GMT · 4 months ago aimagpro.com

arXiv:2505.15138v2 Announce Type: replace
Abstract: This paper investigates infinite-horizon average reward Constrained Markov Decision Processes (CMDPs) with general parametrization. We propose a Primal-Dual Natural Actor-Critic algorithm that adeptly manages constraints while ensuring a high convergence rate. In particular, our algorithm achieves global convergence and constraint violation rates of $tilde{mathcal{O}}(1/sqrt{T})$ over a horizon of length $T$ when the mixing time, $tau_{mathrm{mix}}$, is known to the learner. In absence of knowledge of $tau_{mathrm{mix}}$, the achievable rates change to $tilde{mathcal{O}}(1/T^{0.5-epsilon})$ provided that $T geq tilde{mathcal{O}}left(tau_{mathrm{mix}}^{2/epsilon}right)$. Our results match the theoretical lower bound for Markov Decision Processes and establish a new benchmark in the theoretical exploration of average reward CMDPs.