Clustering Scatter Plots Using Data Depth Measures

Zhanpan Zhang; Xinping Cui; Daniel R Jeske; Xiaoxiao Li; Jonathan Braun; James Borneman

doi:10.4172/2155-6180.S5-001

Clustering Scatter Plots Using Data Depth Measures

J Biom Biostat. 2011:Suppl 5:001. doi: 10.4172/2155-6180.S5-001.

Authors

Zhanpan Zhang¹, Xinping Cui¹, Daniel R Jeske¹, Xiaoxiao Li², Jonathan Braun³, James Borneman⁴

Affiliations

¹ Department of Statistics, University of California, Riverside, CA, USA.
² Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA, USA.
³ Department of Molecular and Medical Pharmacology, University of California, Los Angeles, CA, USA ; Department of Pathology and Laboratory Medicine, University of California, Los Angeles, CA, USA.
⁴ Department of Plant Pathology and Microbiology, University of California, Riverside, CA, USA.

Abstract

Clustering is rapidly becoming a powerful data mining technique, and has been broadly applied to many domains such as bioinformatics and text mining. However, the existing methods can only deal with a data matrix of scalars. In this paper, we introduce a hierarchical clustering procedure that can handle a data matrix of scatter plots. To more accurately reflect the nature of data, we introduce a dissimilarity statistic based on "data depth" to measure the discrepancy between two bivariate distributions without oversimplifying the nature of the underlying pattern. We then combine hypothesis testing with hierarchical clustering to simultaneously cluster the rows and columns of the data matrix of scatter plots. We also propose novel painting metrics and construct heat maps to allow visualization of the clusters. We demonstrate the utility and power of our new clustering method through simulation studies and application to a microbe-host-interaction study.

Keywords: Clustering; Data Depth; Quality Index; Scatter Plot; Visualization.

Abstract

Grants and funding