Directionally dependent multi-view clustering using copula model

PLoS One. 2020 Oct 23;15(10):e0238996. doi: 10.1371/journal.pone.0238996. eCollection 2020.

Abstract

Recent developments in high-throughput methods have resulted in the collection of high-dimensional data types from multiple sources and technologies that measure distinct yet complementary information. Integrated clustering of such multiple data types or multi-view clustering is critical for revealing pathological insights. However, multi-view clustering is challenging due to the complex dependence structure between multiple data types, including directional dependency. Specifically, genomics data types have pre-specified directional dependencies known as the central dogma that describes the process of information flow from DNA to messenger RNA (mRNA) and then from mRNA to protein. Most of the existing multi-view clustering approaches assume an independent structure or pair-wise (non-directional) dependence between data types, thereby ignoring their directional relationship. Motivated by this, we propose a biology-inspired Bayesian integrated multi-view clustering model that uses an asymmetric copula to accommodate the directional dependencies between the data types. Via extensive simulation experiments, we demonstrate the negative impact of ignoring directional dependency on clustering performance. We also present an application of our model to a real-world dataset of breast cancer tumor samples collected from The Cancer Genome Altas program and provide comparative results.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bayes Theorem
  • Breast Neoplasms / genetics
  • Cluster Analysis
  • Computer Simulation
  • Data Interpretation, Statistical
  • Databases, Genetic / statistics & numerical data
  • Female
  • Genomics / methods*
  • Genomics / statistics & numerical data
  • Humans
  • Markov Chains
  • Models, Statistical*
  • Normal Distribution