Adolescent HIV-related behavioural prediction using machine learning: a foundation for precision HIV prevention

Bo Wang; Feifan Liu; Lynette Deveaux; Arlene Ash; Samiran Gosh; Xiaoming Li; Elke Rundensteiner; Lesley Cottrell; Richard Adderley; Bonita Stanton

doi:10.1097/QAD.0000000000002867

Adolescent HIV-related behavioural prediction using machine learning: a foundation for precision HIV prevention

AIDS. 2021 May 1;35(Suppl 1):S75-S84. doi: 10.1097/QAD.0000000000002867.

Authors

Affiliations

¹ Department of Population and Quantitative Health Sciences, University of Massachusetts Medical School, 368 Plantation Street, Worcester, Massachusetts, USA.
² Office of HIV/AIDS, Ministry of Health, Shirley Street, Nassau, The Bahamas.
³ Department of Family Medicine and Public Health Sciences, Wayne State University School of Medicine, Detroit, Michigan.
⁴ Department of Health Promotion, Education, and Behavior, University of South Carolina Arnold School of Public, Columbia, South Carolina.
⁵ Data Science, Worcester Polytechnic Institute, Worcester, Massachusetts.
⁶ Center for Excellence in Disabilities, West Virginia University, Morgantown, West Virginia.
⁷ Hackensack Meridian School of Medicine, Nutley, New Jersey, USA.

Abstract

Background: Precision prevention is increasingly important in HIV prevention research to move beyond universal interventions to those tailored for high-risk individuals. The current study was designed to develop machine learning algorithms for predicting adolescent HIV risk behaviours.

Methods: Comprehensive longitudinal data on adolescent risk behaviours, perceptions, peer and family influence, and neighbourhood risk factors were collected from 2564 grade-10 students at baseline followed for 24 months over 2008-2012. Machine learning techniques [support vector machine (SVM) and random forests] were applied to innovatively leverage longitudinal data for robust HIV risk behaviour prediction. In this study, we focused on two adolescent risk behaviours: had ever had sex and had multiple sex partners. Twenty percent of the data were withheld for model testing.

Results: The SVM model with cost-sensitive learning achieved the highest sensitivity, at 79.1%, specificity of 75.4% with AUC of 0.86 in predicting multiple sex partners on the training data (10-fold cross-validation), and sensitivity of 79.7%, specificity of 76.5% with AUC of 0.86 on the testing data. The random forest model obtained the best performance in predicting had ever had sex, yielding the sensitivity of 78.5%, specificity of 73.1% with AUC of 0.84 on the training data and sensitivity of 82.7%, specificity of 75.3% with AUC of 0.87 on the testing data.

Conclusion: Machine learning methods can be used to build effective prediction model(s) to identify adolescents who are likely to engage in HIV risk behaviours. This study builds a foundation for targeted intervention strategies and informs precision prevention efforts in school-setting.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Adolescent
Adolescent Behavior*
Algorithms
HIV Infections* / diagnosis
HIV Infections* / prevention & control
Humans
Machine Learning
Risk Factors

Abstract

Publication types

MeSH terms

Grants and funding