Identifying homogeneous subgroups of patients and important features: a topological machine learning approach

Ewan Carr; Mathieu Carrière; Bertrand Michel; Frédéric Chazal; Raquel Iniesta

doi:10.1186/s12859-021-04360-9

Identifying homogeneous subgroups of patients and important features: a topological machine learning approach

BMC Bioinformatics. 2021 Sep 20;22(1):449. doi: 10.1186/s12859-021-04360-9.

Authors

Ewan Carr¹, Mathieu Carrière², Bertrand Michel³, Frédéric Chazal⁴, Raquel Iniesta⁵

Affiliations

¹ Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
² Inria Sophia-Antipolis, DataShape Team, Biot, France.
³ Ecole Centrale de Nantes, LMJL - UMR CNRS 6629, Nantes, France.
⁴ Inria Saclay, Ile-de-France, Alan Turing Building, Palaiseau, France.
⁵ Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK. raquel.iniesta@kcl.ac.uk.

Abstract

Background: This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph.

Results: We present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper.

Conclusions: Key strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at https://github.com/kcl-bhi/mapper-pipeline .

Keywords: Clustering; Machine learning; Topological data analysis.

MeSH terms

Algorithms*
Cluster Analysis
Data Analysis
Humans
Machine Learning*

Grants and funding

26338/Brain and Behavior Research Foundation