Predicting total lung capacity from spirometry: a machine learning approach

Luka Beverin; Marko Topalovic; Armin Halilovic; Paul Desbordes; Wim Janssens; Maarten De Vos

doi:10.3389/fmed.2023.1174631

Predicting total lung capacity from spirometry: a machine learning approach

Front Med (Lausanne). 2023 May 19:10:1174631. doi: 10.3389/fmed.2023.1174631. eCollection 2023.

Authors

Luka Beverin¹, Marko Topalovic², Armin Halilovic², Paul Desbordes², Wim Janssens³, Maarten De Vos^{4

2}

Affiliations

¹ Statistics Research Centre, KU Leuven, Leuven, Belgium.
² ArtiQ NV, Leuven, Belgium.
³ Laboratory of Respiratory Diseases and Thoracic Surgery, Department of Chronic Diseases Metabolism and Ageing, Ku Leuven, Leuven, Belgium.
⁴ Stadius, Department of Electrical Engineering, KU Leuven, Leuven, Belgium.

Abstract

Background and objective: Spirometry patterns can suggest that a patient has a restrictive ventilatory impairment; however, lung volume measurements such as total lung capacity (TLC) are required to confirm the diagnosis. The aim of the study was to train a supervised machine learning model that can accurately estimate TLC values from spirometry and subsequently identify which patients would most benefit from undergoing a complete pulmonary function test.

Methods: We trained three tree-based machine learning models on 51,761 spirometry data points with corresponding TLC measurements. We then compared model performance using an independent test set consisting of 1,402 patients. The best-performing model was used to retrospectively identify restrictive ventilatory impairment in the same test set. The algorithm was compared against different spirometry patterns commonly used to predict restriction.

Results: The prevalence of restrictive ventilatory impairment in the test set is 16.7% (234/1402). CatBoost was the best-performing machine learning model. It predicted TLC with a mean squared error (MSE) of 560.1 mL. The sensitivity, specificity, and F1-score of the optimal algorithm for predicting restrictive ventilatory impairment was 83, 92, and 75%, respectively.

Conclusion: A machine learning model trained on spirometry data can estimate TLC to a high degree of accuracy. This approach could be used to develop future smart home-based spirometry solutions, which could aid decision making and self-monitoring in patients with restrictive lung diseases.

Keywords: interstitial lung disease; machine learning; restriction; spirometry; total lung capacity.