In recent sequential multiple assignment randomized trials, outcomes were assessed multiple times to evaluate longer-term impacts of the dynamic treatment regimes (DTRs). Q-learning requires a scalar response to identify the optimal DTR. Inverse probability weighting may be used to estimate the optimal outcome trajectory, but it is inefficient, susceptible to model mis-specification, and unable to characterize how treatment effects manifest over time. We propose modified Q-learning with generalized estimating equations to address these limitations and apply it to the M-bridge trial, which evaluates adaptive interventions to prevent problematic drinking among college freshmen. Simulation studies demonstrate our proposed method improves efficiency and robustness.
Keywords: Q-learning; generalized estimating equation; heterogeneous treatment effect; longitudinal outcome trajectory; sequential multiple assignment randomized trial.
© The Royal Statistical Society 2023. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.