Optimizing hepatitis B virus screening in the United States using a simple demographics-based model

Hepatology. 2022 Feb;75(2):430-437. doi: 10.1002/hep.32142. Epub 2021 Dec 7.

Abstract

Background and aims: Chronic hepatitis B (CHB) affects >290 million persons globally, and only 10% have been diagnosed, presenting a severe gap that must be addressed. We developed logistic regression (LR) and machine learning (ML; random forest) models to accurately identify patients with HBV, using only easily obtained demographic data from a population-based data set.

Approach and results: We identified participants with data on HBsAg, birth year, sex, race/ethnicity, and birthplace from 10 cycles of the National Health and Nutrition Examination Survey (1999-2018) and divided them into two cohorts: training (cycles 2, 3, 5, 6, 8, and 10; n = 39,119) and validation (cycles 1, 4, 7, and 9; n = 21,569). We then developed and tested our two models. The overall cohort was 49.2% male, 39.7% White, 23.2% Black, 29.6% Hispanic, and 7.5% Asian/other, with a median birth year of 1973. In multivariable logistic regression, the following factors were associated with HBV infection: birth year 1991 or after (adjusted OR [aOR], 0.28; p < 0.001); male sex (aOR, 1.49; p = 0.0080); Black and Asian/other versus White (aOR, 5.23 and 9.13; p < 0.001 for both); and being USA-born (vs. foreign-born; aOR, 0.14; p < 0.001). We found that the ML model consistently outperformed the LR model, with higher area under the receiver operating characteristic values (0.83 vs. 0.75 in validation cohort; p < 0.001) and better differentiation of high- and low-risk persons.

Conclusions: Our ML model provides a simple, targeted approach to HBV screening, using only easily obtained demographic data.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Asian
  • Birth Cohort
  • Black People
  • Demography
  • Epidemiological Models
  • Female
  • Hepatitis B, Chronic / diagnosis*
  • Hepatitis B, Chronic / ethnology
  • Hispanic or Latino
  • Humans
  • Logistic Models*
  • Machine Learning*
  • Male
  • Mass Screening
  • Nutrition Surveys
  • Patient Selection
  • ROC Curve
  • Sex Factors
  • United States / epidemiology
  • White People