The shape of cancer relapse: Topological data analysis predicts recurrence in paediatric acute lymphoblastic leukaemia

PLoS Comput Biol. 2023 Aug 14;19(8):e1011329. doi: 10.1371/journal.pcbi.1011329. eCollection 2023 Aug.

Abstract

Although children and adolescents with acute lymphoblastic leukaemia (ALL) have high survival rates, approximately 15-20% of patients relapse. Risk of relapse is routinely estimated at diagnosis by biological factors, including flow cytometry data. This high-dimensional data is typically manually assessed by projecting it onto a subset of biomarkers. Cell density and "empty spaces" in 2D projections of the data, i.e. regions devoid of cells, are then used for qualitative assessment. Here, we use topological data analysis (TDA), which quantifies shapes, including empty spaces, in data, to analyse pre-treatment ALL datasets with known patient outcomes. We combine these fully unsupervised analyses with Machine Learning (ML) to identify significant shape characteristics and demonstrate that they accurately predict risk of relapse, particularly for patients previously classified as 'low risk'. We independently confirm the predictive power of CD10, CD20, CD38, and CD45 as biomarkers for ALL diagnosis. Based on our analyses, we propose three increasingly detailed prognostic pipelines for analysing flow cytometry data from ALL patients depending on technical and technological availability: 1. Visual inspection of specific biological features in biparametric projections of the data; 2. Computation of quantitative topological descriptors of such projections; 3. A combined analysis, using TDA and ML, in the four-parameter space defined by CD10, CD20, CD38 and CD45. Our analyses readily extend to other haematological malignancies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Child
  • Flow Cytometry
  • Hematologic Neoplasms*
  • Humans
  • Immunophenotyping
  • Neoplasm Recurrence, Local
  • Precursor Cell Lymphoblastic Leukemia-Lymphoma* / pathology
  • Recurrence

Grants and funding

This work was partially supported by the Fundación Española para la Ciencia y la Tecnología (FECYT project PR214 to M.R.), the Asociación Pablo Ugarte (APU, Spain, to M.R.), Junta de Andalucía (Spain) group FQM-201 (to M.R.), Junta de Comunidades de Castilla-La Mancha (grant number SBPLY/21/180501/000145 to V.M.P.-G.), the Programme of Research and Transfer Promotion from the University of Cádiz (grant number EST2020-025 to S.C.), Ministry of Science and Technology, Spain (grant number PID2019-110895RB-I00 to V.M.P.-G.), Spanish National Plan for Scientific and Technical Research and Innovation (grant number PDC2022-133520-I00 to V.M.P.-G.). This work was also subsidized by a grant for the research and biomedical innovation in the health sciences within the framework of the Integrated Territorial Initiative (ITI) for the province of Cadiz (grant number ITI-0038-2019 to M.R., 80% co-financed by the funds of the FEDER Operational Program of Andalusia 2014-2020, European Regional Development Fund, Council of Health and Families). B.J.S. and H.M.B. are members of the Centre for Topological Data Analysis, funded by the EPSRC grant (EP/R018472/1). B.J.S. is further supported by the L’Oréal-UNESCO UK and Ireland For Women in Science Rising Talent Programme. S.C. was hired at the University of Cádiz with funding from the ITI project ITI-0038-2019. Á.M.-R. was hired at the University of Cádiz with funding from the APU project. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.