Machine-Learning Approaches for Predicting the Need of Oxygen Therapy in Early-Stage COVID-19 in Japan: Multicenter Retrospective Observational Study

Front Med (Lausanne). 2022 Feb 23:9:846525. doi: 10.3389/fmed.2022.846525. eCollection 2022.

Abstract

Background: Early prediction of oxygen therapy in patients with coronavirus disease 2019 (COVID-19) is vital for triage. Several machine-learning prognostic models for COVID-19 are currently available. However, external validation of these models has rarely been performed. Therefore, most reported predictive performance is optimistic and has a high risk of bias. This study aimed to develop and validate a model that predicts oxygen therapy needs in the early stages of COVID-19 using a sizable multicenter dataset.

Methods: This multicenter retrospective study included consecutive COVID-19 hospitalized patients confirmed by a reverse transcription chain reaction in 11 medical institutions in Fukui, Japan. We developed and validated seven machine-learning models (e.g., penalized logistic regression model) using routinely collected data (e.g., demographics, simple blood test). The primary outcome was the need for oxygen therapy (≥1 L/min or SpO2 ≤ 94%) during hospitalization. C-statistics, calibration slope, and association measures (e.g., sensitivity) evaluated the performance of the model using the test set (randomly selected 20% of data for internal validation). Among these seven models, the machine-learning model that showed the best performance was re-evaluated using an external dataset. We compared the model performances using the A-DROP criteria (modified version of CURB-65) as a conventional method.

Results: Of the 396 patients with COVID-19 for the model development, 102 patients (26%) required oxygen therapy during hospitalization. For internal validation, machine-learning models, except for the k-point nearest neighbor, had a higher discrimination ability than the A-DORP criteria (P < 0.01). The XGboost had the highest c-statistic in the internal validation (0.92 vs. 0.69 in A-DROP criteria; P < 0.001). For the external validation with 728 temporal independent datasets (106 patients [15%] required oxygen therapy), the XG boost model had a higher c-statistic (0.88 vs. 0.69 in A-DROP criteria; P < 0.001).

Conclusions: Machine-learning models demonstrated a more significant performance in predicting the need for oxygen therapy in the early stages of COVID-19.

Keywords: COVID-19; PROBAST; TRIPOD; machine learning; medical triage; multicenter; prognostic model.