Estimating the conditional probability of developing human papilloma virus related oropharyngeal cancer by combining machine learning and inverse Bayesian modelling

PLoS Comput Biol. 2021 Aug 20;17(8):e1009289. doi: 10.1371/journal.pcbi.1009289. eCollection 2021 Aug.

Abstract

The epidemic increase in the incidence of Human Papilloma Virus (HPV) related Oropharyngeal Squamous Cell Carcinomas (OPSCCs) in several countries worldwide represents a significant public health concern. Although gender neutral HPV vaccination programmes are expected to cause a reduction in the incidence rates of OPSCCs, these effects will not be evident in the foreseeable future. Secondary prevention strategies are currently not feasible due to an incomplete understanding of the natural history of oral HPV infections in OPSCCs. The key parameters that govern natural history models remain largely ill-defined for HPV related OPSCCs and cannot be easily inferred from experimental data. Mathematical models have been used to estimate some of these ill-defined parameters in cervical cancer, another HPV related cancer leading to successful implementation of cancer prevention strategies. We outline a "double-Bayesian" mathematical modelling approach, whereby, a Bayesian machine learning model first estimates the probability of an individual having an oral HPV infection, given OPSCC and other covariate information. The model is then inverted using Bayes' theorem to reverse the probability relationship. We use data from the Surveillance, Epidemiology, and End Results (SEER) cancer registry, SEER Head and Neck with HPV Database and the National Health and Nutrition Examination Surveys (NHANES), representing the adult population in the United States to derive our model. The model contains 8,106 OPSCC patients of which 73.0% had an oral HPV infection. When stratified by age, sex, marital status and race/ethnicity, the model estimated a higher conditional probability for developing OPSCCs given an oral HPV infection in non-Hispanic White males and females compared to other races/ethnicities. The proposed Bayesian model represents a proof-of-concept of a natural history model of HPV driven OPSCCs and outlines a strategy for estimating the conditional probability of an individual's risk of developing OPSCC following an oral HPV infection.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Alphapapillomavirus / pathogenicity*
  • Bayes Theorem*
  • Female
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Oropharyngeal Neoplasms / epidemiology
  • Oropharyngeal Neoplasms / virology*
  • Probability*
  • SEER Program
  • Squamous Cell Carcinoma of Head and Neck / epidemiology
  • Squamous Cell Carcinoma of Head and Neck / virology*

Grants and funding

This work was funded by a grant (ICE-2015-1037) awarded to JOL and CM by Health Research Board Ireland (www.hrb.ie). ACP is funded by Science Foundation Ireland (www.sfi.ie) Career Development Award grant 17/CDA/4695 and a SFI centre grant 12/RC/2289_P2. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.