A machine learning approach to identify groups of patients with hematological malignant disorders

Comput Methods Programs Biomed. 2024 Apr:246:108011. doi: 10.1016/j.cmpb.2024.108011. Epub 2024 Jan 9.

Abstract

Background and objective: Vaccination against SARS-CoV-2 in immunocompromised patients with hematologic malignancies (HM) is crucial to reduce the severity of COVID-19. Despite vaccination efforts, over a third of HM patients remain unresponsive, increasing their risk of severe breakthrough infections. This study aims to leverage machine learning's adaptability to COVID-19 dynamics, efficiently selecting patient-specific features to enhance predictions and improve healthcare strategies. Highlighting the complex COVID-hematology connection, the focus is on interpretable machine learning to provide valuable insights to clinicians and biologists.

Methods: The study evaluated a dataset with 1166 patients with hematological diseases. The output was the achievement or non-achievement of a serological response after full COVID-19 vaccination. Various machine learning methods were applied, with the best model selected based on metrics such as the Area Under the Curve (AUC), Sensitivity, Specificity, and Matthew Correlation Coefficient (MCC). Individual SHAP values were obtained for the best model, and Principal Component Analysis (PCA) was applied to these values. The patient profiles were then analyzed within identified clusters.

Results: Support vector machine (SVM) emerged as the best-performing model. PCA applied to SVM-derived SHAP values resulted in four perfectly separated clusters. These clusters are characterized by the proportion of patients that generate antibodies (PPGA). Cluster 1, with the second-highest PPGA (69.91%), included patients with aggressive diseases and factors contributing to increased immunodeficiency. Cluster 2 had the lowest PPGA (33.3%), but the small sample size limited conclusive findings. Cluster 3, representing the majority of the population, exhibited a high rate of antibody generation (84.39%) and a better prognosis compared to cluster 1. Cluster 4, with a PPGA of 66.33%, included patients with B-cell non-Hodgkin's lymphoma on corticosteroid therapy.

Conclusions: The methodology successfully identified four separate patient clusters using Machine Learning and Explainable AI (XAI). We then analyzed each cluster based on the percentage of HM patients who generated antibodies after COVID-19 vaccination. The study suggests the methodology's potential applicability to other diseases, highlighting the importance of interpretable ML in healthcare research and decision-making.

Keywords: COVID-19; Explainable AI (XAI); Hematological disease; High risk groups identification; Machine learning; SARS-CoV-2 mRNA vaccines; Serological response.

MeSH terms

  • Area Under Curve
  • COVID-19 Vaccines
  • COVID-19*
  • Hematologic Diseases*
  • Humans
  • Machine Learning

Substances

  • COVID-19 Vaccines