Interpretable Failure Analysis in Multi-Agent Reinforcement Learning Systems
arXiv:2602.08104v2 Announce Type: replace-cross Abstract: Multi-Agent Reinforcement Learning (MARL) is increasingly deployed in safety-critical domains, yet methods for interpretable failure detection and attribution remain underdeveloped. We introduce a two-stage gradient-based framework that provides interpretable diagnostics for three critical failure analysis…
