Clustering Clinical Data in R

Ana Pina; Maria Paula Macedo; Roberto Henriques

doi:10.1007/978-1-4939-9744-2_14

Clustering Clinical Data in R

Methods Mol Biol. 2020:2051:309-343. doi: 10.1007/978-1-4939-9744-2_14.

Authors

Ana Pina^{1

2

3}, Maria Paula Macedo^{4

5

6}, Roberto Henriques⁷

Affiliations

¹ Centro de Estudos de Doenças Crónicas (CEDOC), NOVA Medical School-Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Lisbon, Portugal. ana.pina@nms.unl.pt.
² ProRegeM PhD Programme, NOVA Medical School/Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Lisbon, Portugal. ana.pina@nms.unl.pt.
³ Department of Medical Sciences, Institute of Biomedicine, University of Aveiro, Aveiro, Portugal. ana.pina@nms.unl.pt.
⁴ Centro de Estudos de Doenças Crónicas (CEDOC), NOVA Medical School-Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Lisbon, Portugal.
⁵ Department of Medical Sciences, Institute of Biomedicine, University of Aveiro, Aveiro, Portugal.
⁶ APDP-Diabetes Portugal Education and Research Center (APDP-ERC), Lisbon, Portugal.
⁷ NOVA Information Management School (NOVA IMS), Universidade NOVA de Lisboa, Lisbon, Portugal.

PMID: 31552636
DOI: 10.1007/978-1-4939-9744-2_14

Abstract

We are currently witnessing a paradigm shift from evidence-based medicine to precision medicine, which has been made possible by the enormous development of technology. The advances in data mining algorithms will allow us to integrate trans-omics with clinical data, contributing to our understanding of pathological mechanisms and massively impacting on the clinical sciences. Cluster analysis is one of the main data mining techniques and allows for the exploration of data patterns that the human mind cannot capture.This chapter focuses on the cluster analysis of clinical data, using the statistical software, R. We outline the cluster analysis process, underlining some clinical data characteristics. Starting with the data preprocessing step, we then discuss the advantages and disadvantages of the most commonly used clustering algorithms and point to examples of their applications in clinical work. Finally, we briefly discuss how to perform validation of clusters. Throughout the chapter we highlight R packages suitable for each computational step of cluster analysis.

Keywords: Clinical data; Cluster analysis; Cluster optimization; Cluster stability; Cluster tendency; Cluster validation; Stratification.

MeSH terms

Algorithms
Cluster Analysis*
Data Mining*
Humans
Precision Medicine*
Software*