A Machine Learning Approach to Identify Predictors of Frequent Vaping and Vulnerable Californian Youth Subgroups

Nicotine Tob Res. 2022 Jun 15;24(7):1028-1036. doi: 10.1093/ntr/ntab257.

Abstract

Introduction: Machine learning presents a unique opportunity to improve electronic cigarette (vaping) monitoring in youth. Here we built a random forest model to predict frequent vaping status among Californian youth and to identify contributing factors and vulnerable populations.

Methods: In this prospective cohort study, 1281 ever-vaping twelfth-grade students from metropolitan Los Angeles were surveyed in Fall and in 6-month in Spring. Frequent vaping was measured at the 6-month follow-up as nicotine-containing vaping on 20 or more days in past 30 days. Predictors (n = 131) encompassed sociodemographic characteristics, substance use and perceptions, health status, and characteristics of the household, school, and neighborhood. A random forest was developed to identify the top ten predictors of frequent vaping and interactions by sociodemographic variables.

Results: Forty participants (3.1%) reported frequent vaping at the follow-up. The random forest outperformed a logistic regression model in prediction (C-Index = 0.87 vs. 0.77). Higher past-month nicotine concentration in vape, more daily vaping sessions, and greater nicotine dependence were the top three of the ten most important predictors of frequent vaping. Interactions were found between age and perceived discrimination, and between age and race/ethnicity, as those who were younger than their classmates and either reported experiencing discrimination frequently or identified as Asian or Native American/Pacific Islander were at increased risk of becoming frequent vapers.

Conclusions: Machine learning can produce models that accurately predict progression of vaping behaviors among youth. The potential association between frequent vaping and perceived discrimination warrants more in-depth analyses to confirm if discrimination constitutes a cause of increased vaping.

Implications: This study demonstrates the utility of machine learning in predicting status of frequent vaping over 6 months and understanding predictors and nuanced intersectionality by sociodemographic attributes. The high performance of the random forest model has practical implications for a personalized risk calculator that supports vaping prevention program. Public health officials need to recognize the importance of social factors that contribute to frequent vaping, particularly perceived discrimination. Youth subpopulations, including younger high school students and Asians or Native Americans/Pacific Islanders, might require specially designed interventions to help prevent habit-forming in vaping.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, P.H.S.
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Electronic Nicotine Delivery Systems*
  • Humans
  • Machine Learning
  • Nicotine
  • Prospective Studies
  • Vaping*

Substances

  • Nicotine