Adversarial Latent-State Training for Robust Policies in Partially Observable Domains
arXiv:2603.07313v2 Announce Type: replace Abstract: Robustness under latent distribution shift remains challenging in partially observable reinforcement learning. We formalize a focused setting where an adversary selects a hidden initial latent distribution before the episode, termed an adversarial latent-initial-state POMDP. Theoretically,…
