A machine learning approach to predicting risk of myelodysplastic syndrome

Leuk Res. 2021 Oct:109:106639. doi: 10.1016/j.leukres.2021.106639. Epub 2021 Jun 8.

Abstract

Background: Early myelodysplastic syndrome (MDS) diagnosis can allow physicians to provide early treatment, which may delay advancement of MDS and improve quality of life. However, MDS often goes unrecognized and is difficult to distinguish from other disorders. We developed a machine learning algorithm for the prediction of MDS one year prior to clinical diagnosis of the disease.

Methods: Retrospective analysis was performed on 790,470 patients over the age of 45 seen in the United States between 2007 and 2020. A gradient boosted decision tree model (XGB) was built to predict MDS diagnosis using vital signs, lab results, and demographics from the prior two years of patient data. The XGB model was compared to logistic regression (LR) and artificial neural network (ANN) models. The models did not use blast percentage and cytogenetics information as inputs. Predictions were made one year prior to MDS diagnosis as determined by International Classification of Diseases (ICD) codes, 9th and 10th revisions. Performance was assessed with regard to area under the receiver operating characteristic curve (AUROC).

Results: On a hold-out test set, the XGB model achieved an AUROC value of 0.87 for prediction of MDS one year prior to diagnosis, with a sensitivity of 0.79 and specificity of 0.80. The XGB model was compared against LR and ANN models, which achieved an AUROC of 0.838 and 0.832, respectively.

Conclusions: Machine learning may allow for early MDS diagnosis MDS and more appropriate treatment administration.

Keywords: Early prediction; Electronic health records (EHR); Machine learning; Myelodysplastic syndrome (MDS); Risk assessment.

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Algorithms*
  • Case-Control Studies
  • Female
  • Follow-Up Studies
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Myelodysplastic Syndromes / diagnosis*
  • Myelodysplastic Syndromes / epidemiology
  • Neural Networks, Computer*
  • Prognosis
  • Quality of Life*
  • ROC Curve
  • Retrospective Studies
  • Risk Assessment / methods*
  • United States / epidemiology