Background: Precision prevention is increasingly important in HIV prevention research to move beyond universal interventions to those tailored for high-risk individuals. The current study was designed to develop machine learning algorithms for predicting adolescent HIV risk behaviours.
Methods: Comprehensive longitudinal data on adolescent risk behaviours, perceptions, peer and family influence, and neighbourhood risk factors were collected from 2564 grade-10 students at baseline followed for 24 months over 2008-2012. Machine learning techniques [support vector machine (SVM) and random forests] were applied to innovatively leverage longitudinal data for robust HIV risk behaviour prediction. In this study, we focused on two adolescent risk behaviours: had ever had sex and had multiple sex partners. Twenty percent of the data were withheld for model testing.
Results: The SVM model with cost-sensitive learning achieved the highest sensitivity, at 79.1%, specificity of 75.4% with AUC of 0.86 in predicting multiple sex partners on the training data (10-fold cross-validation), and sensitivity of 79.7%, specificity of 76.5% with AUC of 0.86 on the testing data. The random forest model obtained the best performance in predicting had ever had sex, yielding the sensitivity of 78.5%, specificity of 73.1% with AUC of 0.84 on the training data and sensitivity of 82.7%, specificity of 75.3% with AUC of 0.87 on the testing data.
Conclusion: Machine learning methods can be used to build effective prediction model(s) to identify adolescents who are likely to engage in HIV risk behaviours. This study builds a foundation for targeted intervention strategies and informs precision prevention efforts in school-setting.
Copyright © 2021 Wolters Kluwer Health, Inc. All rights reserved.