Provably Efficient Sample Complexity for Robust CMDP
arXiv:2511.07486v1 Announce Type: new Abstract: We study the problem of learning policies that maximize cumulative reward while satisfying safety constraints, even when the real environment differs from a simulator or nominal model. We focus on robust constrained Markov decision processes…
