Development and Testing of a Machine Learning Model Using 18F-Fluorodeoxyglucose PET/CT-Derived Metabolic Parameters to Classify Human Papillomavirus Status in Oropharyngeal Squamous Carcinoma

Korean J Radiol. 2023 Jan;24(1):51-61. doi: 10.3348/kjr.2022.0397.

Abstract

Objective: To develop and test a machine learning model for classifying human papillomavirus (HPV) status of patients with oropharyngeal squamous cell carcinoma (OPSCC) using 18F-fluorodeoxyglucose (18F-FDG) PET-derived parameters in derived parameters and an appropriate combination of machine learning methods in patients with OPSCC.

Materials and methods: This retrospective study enrolled 126 patients (118 male; mean age, 60 years) with newly diagnosed, pathologically confirmed OPSCC, that underwent 18F-FDG PET-computed tomography (CT) between January 2012 and February 2020. Patients were randomly assigned to training and internal validation sets in a 7:3 ratio. An external test set of 19 patients (16 male; mean age, 65.3 years) was recruited sequentially from two other tertiary hospitals. Model 1 used only PET parameters, Model 2 used only clinical features, and Model 3 used both PET and clinical parameters. Multiple feature transforms, feature selection, oversampling, and training models are all investigated. The external test set was used to test the three models that performed best in the internal validation set. The values for area under the receiver operating characteristic curve (AUC) were compared between models.

Results: In the external test set, ExtraTrees-based Model 3, which uses two PET-derived parameters and three clinical features, with a combination of MinMaxScaler, mutual information selection, and adaptive synthetic sampling approach, showed the best performance (AUC = 0.78; 95% confidence interval, 0.46-1). Model 3 outperformed Model 1 using PET parameters alone (AUC = 0.48, p = 0.047) and Model 2 using clinical parameters alone (AUC = 0.52, p = 0.142) in predicting HPV status.

Conclusion: Using oversampling and mutual information selection, an ExtraTree-based HPV status classifier was developed by combining metabolic parameters derived from 18F-FDG PET/CT and clinical parameters in OPSCC, which exhibited higher performance than the models using either PET or clinical parameters alone.

Keywords: Human papillomavirus; Machine learning; Oropharynx; Positron emission tomography; Squamous cell carcinoma.

Publication types

  • Randomized Controlled Trial
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Carcinoma, Squamous Cell* / diagnostic imaging
  • Female
  • Fluorodeoxyglucose F18
  • Head and Neck Neoplasms*
  • Human Papillomavirus Viruses
  • Humans
  • Machine Learning
  • Male
  • Middle Aged
  • Oropharyngeal Neoplasms* / diagnosis
  • Papillomavirus Infections* / diagnostic imaging
  • Positron Emission Tomography Computed Tomography
  • Retrospective Studies
  • Squamous Cell Carcinoma of Head and Neck
  • Tomography, X-Ray Computed

Substances

  • Fluorodeoxyglucose F18