A Flexible Framework for Incorporating Patient Preferences Into Q-Learning

arXiv:2307.12022v2 Announce Type: replace Abstract: In real-world healthcare settings, treatment decisions often involve optimizing for multivariate outcomes such as treatment efficacy and severity of side effects based on individual preferences. However, existing statistical methods for estimating dynamic treatment regimes (DTRs) usually assume a univariate outcome, and the few methods that deal with composite outcomes suffer from limitations such as restrictions to a single time point and limited theoretical guarantees. To address these limitations, we propose Latent Utility Q-Learning (LUQ-Learning), a latent model approach that adapts Q-learning to tackle the aforementioned difficulties. Our framework allows for an arbitrary finite number of decision points and outcomes, incorporates personal preferences, and achieves asymptotic performance guarantees with realistic assumptions. We conduct simulation experiments based on an ongoing trial for low back pain as well as a well-known trial for schizophrenia. In both settings, LUQ-Learning achieves highly competitive performance compared to alternative baselines.

2025-09-03 04:00 GMT · 21 hours ago arxiv.org

arXiv:2307.12022v2 Announce Type: replace Abstract: In real-world healthcare settings, treatment decisions often involve optimizing for multivariate outcomes such as treatment efficacy and severity of side effects based on individual preferences. However, existing statistical methods for estimating dynamic treatment regimes (DTRs) usually assume a univariate outcome, and the few methods that deal with composite outcomes suffer from limitations such as restrictions to a single time point and limited theoretical guarantees. To address these limitations, we propose Latent Utility Q-Learning (LUQ-Learning), a latent model approach that adapts Q-learning to tackle the aforementioned difficulties. Our framework allows for an arbitrary finite number of decision points and outcomes, incorporates personal preferences, and achieves asymptotic performance guarantees with realistic assumptions. We conduct simulation experiments based on an ongoing trial for low back pain as well as a well-known trial for schizophrenia. In both settings, LUQ-Learning achieves highly competitive performance compared to alternative baselines.

Original: https://arxiv.org/abs/2307.12022