Non-stationary and Varying-discounting Markov Decision Processes for Reinforcement Learning
arXiv:2511.17598v1 Announce Type: new Abstract: Algorithms developed under stationary Markov Decision Processes (MDPs) often face challenges in non-stationary environments, and infinite-horizon formulations may not directly apply to finite-horizon tasks. To address these limitations, we introduce the Non-stationary and Varying-discounting MDP…
