Development and Validation of a Machine Learning Prediction Model of Posttraumatic Stress Disorder After Military Deployment

Santiago Papini; Sonya B Norman; Laura Campbell-Sills; Xiaoying Sun; Feng He; Ronald C Kessler; Robert J Ursano; Sonia Jain; Murray B Stein

doi:10.1001/jamanetworkopen.2023.21273

Development and Validation of a Machine Learning Prediction Model of Posttraumatic Stress Disorder After Military Deployment

JAMA Netw Open. 2023 Jun 1;6(6):e2321273. doi: 10.1001/jamanetworkopen.2023.21273.

Authors

Santiago Papini^{1

2}, Sonya B Norman^{1

3

4}, Laura Campbell-Sills¹, Xiaoying Sun⁵, Feng He⁵, Ronald C Kessler⁶, Robert J Ursano⁷, Sonia Jain⁵, Murray B Stein^{1

5

8}

Affiliations

¹ Department of Psychiatry, University of California, San Diego, La Jolla.
² Division of Research, Kaiser Permanente Northern California, Oakland.
³ National Center for PTSD, White River Junction, Vermont.
⁴ Veterans Affairs Center of Excellence for Stress and Mental Health, San Diego, California.
⁵ Herbert Wertheim School of Public Health and Human Longevity Science, University of California, San Diego, La Jolla.
⁶ Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts.
⁷ Center for the Study of Traumatic Stress, Department of Psychiatry, Uniformed Services University of the Health Sciences, Bethesda, Maryland.
⁸ Psychiatry Service, Veterans Affairs San Diego Healthcare System, San Diego, California.

Abstract

Importance: Military deployment involves significant risk for life-threatening experiences that can lead to posttraumatic stress disorder (PTSD). Accurate predeployment prediction of PTSD risk may facilitate the development of targeted intervention strategies to enhance resilience.

Objective: To develop and validate a machine learning (ML) model to predict postdeployment PTSD.

Design, setting, and participants: This diagnostic/prognostic study included 4771 soldiers from 3 US Army brigade combat teams who completed assessments between January 9, 2012, and May 1, 2014. Predeployment assessments occurred 1 to 2 months before deployment to Afghanistan, and follow-up assessments occurred approximately 3 and 9 months post deployment. Machine learning models to predict postdeployment PTSD were developed in the first 2 recruited cohorts using as many as 801 predeployment predictors from comprehensive self-report assessments. In the development phase, cross-validated performance metrics and predictor parsimony were considered to select an optimal model. Next, the selected model's performance was evaluated with area under the receiver operating characteristics curve and expected calibration error in a temporally and geographically distinct cohort. Data analyses were performed from August 1 to November 30, 2022.

Main outcomes and measures: Posttraumatic stress disorder diagnosis was assessed by clinically calibrated self-report measures. Participants were weighted in all analyses to address potential biases related to cohort selection and follow-up nonresponse.

Results: This study included 4771 participants (mean [SD] age, 26.9 [6.2] years), 4440 (94.7%) of whom were men. In terms of race and ethnicity, 144 participants (2.8%) identified as American Indian or Alaska Native, 242 (4.8%) as Asian, 556 (13.3%) as Black or African American, 885 (18.3%) as Hispanic, 106 (2.1%) as Native Hawaiian or other Pacific Islander, 3474 (72.2%) as White, and 430 (8.9%) as other or unknown race or ethnicity; participants could identify as of more than 1 race or ethnicity. A total of 746 participants (15.4%) met PTSD criteria post deployment. In the development phase, models had comparable performance (log loss range, 0.372-0.375; area under the curve range, 0.75-0.76). A gradient-boosting machine with 58 core predictors was selected over an elastic net with 196 predictors and a stacked ensemble of ML models with 801 predictors. In the independent test cohort, the gradient-boosting machine had an area under the curve of 0.74 (95% CI, 0.71-0.77) and low expected calibration error of 0.032 (95% CI, 0.020-0.046). Approximately one-third of participants with the highest risk accounted for 62.4% (95% CI, 56.5%-67.9%) of the PTSD cases. Core predictors cut across 17 distinct domains: stressful experiences, social network, substance use, childhood or adolescence, unit experiences, health, injuries, irritability or anger, personality, emotional problems, resilience, treatment, anxiety, attention or concentration, family history, mood, and religion.

Conclusions and relevance: In this diagnostic/prognostic study of US Army soldiers, an ML model was developed to predict postdeployment PTSD risk with self-reported information collected before deployment. The optimal model showed good performance in a temporally and geographically distinct validation sample. These results indicate that predeployment stratification of PTSD risk is feasible and may facilitate the development of targeted prevention and early intervention strategies.

MeSH terms

Adolescent
Adult
Anxiety
Anxiety Disorders
Child
Ethnicity
Female
Humans
Male
Military Deployment
Stress Disorders, Post-Traumatic* / diagnosis
Stress Disorders, Post-Traumatic* / epidemiology