External validity of machine learning-based prognostic scores for cystic fibrosis: A retrospective study using the UK and Canadian registries

Yuchao Qin; Ahmed Alaa; Andres Floto; Mihaela van der Schaar

doi:10.1371/journal.pdig.0000179

External validity of machine learning-based prognostic scores for cystic fibrosis: A retrospective study using the UK and Canadian registries

PLOS Digit Health. 2023 Jan 12;2(1):e0000179. doi: 10.1371/journal.pdig.0000179. eCollection 2023 Jan.

Authors

Yuchao Qin¹, Ahmed Alaa^{2

3}, Andres Floto¹, Mihaela van der Schaar^{1

4

5}

Affiliations

¹ University of Cambridge, Cambridge, United Kingdom.
² University of California Berkeley, Berkeley, California, United States of America.
³ University of California San Francisco, San Francisco, California, United States of America.
⁴ Alan Turing Institute, London, United Kingdom.
⁵ University of California Los Angeles, Los Angeles, California, United States of America.

Abstract

Precise and timely referral for lung transplantation is critical for the survival of cystic fibrosis patients with terminal illness. While machine learning (ML) models have been shown to achieve significant improvement in prognostic accuracy over current referral guidelines, the external validity of these models and their resulting referral policies has not been fully investigated. Here, we studied the external validity of machine learning-based prognostic models using annual follow-up data from the UK and Canadian Cystic Fibrosis Registries. Using a state-of-the-art automated ML framework, we derived a model for predicting poor clinical outcomes in patients enrolled in the UK registry, and conducted external validation of the derived model using the Canadian Cystic Fibrosis Registry. In particular, we studied the effect of (1) natural variations in patient characteristics across populations and (2) differences in clinical practice on the external validity of ML-based prognostic scores. Overall, decrease in prognostic accuracy on the external validation set (AUCROC: 0.88, 95% CI 0.88-0.88) was observed compared to the internal validation accuracy (AUCROC: 0.91, 95% CI 0.90-0.92). Based on our ML model, analysis on feature contributions and risk strata revealed that, while external validation of ML models exhibited high precision on average, both factors (1) and (2) can undermine the external validity of ML models in patient subgroups with moderate risk for poor outcomes. A significant boost in prognostic power (F1 score) from 0.33 (95% CI 0.31-0.35) to 0.45 (95% CI 0.45-0.45) was observed in external validation when variations in these subgroups were accounted in our model. Our study highlighted the significance of external validation of ML models for cystic fibrosis prognostication. The uncovered insights on key risk factors and patient subgroups can be used to guide the cross-population adaptation of ML-based models and inspire new research on applying transfer learning methods for fine-tuning ML models to cope with regional variations in clinical care.

Copyright: © 2023 Qin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Grants and funding

YQ receives scholarship for his PhD study from the UK Cystic Fibrosis Trust. AF is funded by the US Cystic Fibrosis Foundation and the UK Cystic Fibrosis Trust (Digital Health Research Grant No. DHRP016). The funders had no role in the study design, data processing, model development and analysis, decision to publish, or preparation of the manuscript.