Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references

Clin Respir J. 2023 Aug;17(8):819-828. doi: 10.1111/crj.13662. Epub 2023 Jul 13.

Abstract

Introduction: Spirometry is associated with several diagnostic difficulties, and as a result, misdiagnosis of chronic obstructive pulmonary disease (COPD) occurs. This study aims to investigate how random forest (RF) can be used to improve the existing clinical FVC and FEV1 reference values in a large and representative cohort of the general population of the US without known lung disease.

Materials and methods: FVC, FEV1, body measures, and demographic data from 23 433 people were extracted from NHANES. RF was used to develop different prediction models. The accuracy of RF was compared with the existing Danish clinical references, an improved multiple linear regression (MLR) model, and a model from the literature.

Results: The correlation between actual and predicted FVC and FEV1 and the 95% confidence interval for RF were found to be FVC = 0.85 (0.85; 0.86) (p < 0.001), FEV1 = 0.92 (0.92; 0.93) (p < 0.001), and existing clinical references were FVC = 0.66 (0.64; 0.68) (p < 0.001) and FEV1 = 0.69 (0.67; 0.70) (p < 0.001). Slope and intercept for the RF models predicting FVC and FEV1 were FVC 1.06 and -238.04 (mL), FEV1: 0.86 and 455.36 (mL), and for the MLR models, slope and intercept were FVC: 0.99 and 38.56 39 (mL), and FEV1: 1.01 and -56.57-57 (mL).

Conclusions: The results point toward machine learning models such as RF have the potential to improve the prediction of estimated lung function for individual patients. These predictions are used as reference values and are an important part of assessing spirometry measurements in clinical practice. Further work is necessary in order to reduce the size of the intercepts obtained through these results.

Keywords: COPD; clinical references; misdiagnosis; multiple linear regression; random forest; spirometry.

Publication types

  • Letter

MeSH terms

  • Forced Expiratory Volume
  • Humans
  • Lung
  • Nutrition Surveys
  • Pulmonary Disease, Chronic Obstructive* / diagnosis
  • Pulmonary Disease, Chronic Obstructive* / epidemiology
  • Random Forest*
  • Spirometry / methods
  • Vital Capacity