A Dirichlet process mixture model for clustering longitudinal gene expression data

Stat Med. 2017 Sep 30;36(22):3495-3506. doi: 10.1002/sim.7374. Epub 2017 Jun 15.

Abstract

Subgroup identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to define subgroups. Longitudinal gene expression profiles might provide additional information on disease progression than what is captured by baseline profiles alone. Therefore, subgroup identification could be more accurate and effective with the aid of longitudinal gene expression data. However, existing statistical methods are unable to fully utilize these data for patient clustering. In this article, we introduce a novel clustering method in the Bayesian setting based on longitudinal gene expression profiles. This method, called BClustLonG, adopts a linear mixed-effects framework to model the trajectory of genes over time, while clustering is jointly conducted based on the regression coefficients obtained from all genes. In order to account for the correlations among genes and alleviate the high dimensionality challenges, we adopt a factor analysis model for the regression coefficients. The Dirichlet process prior distribution is utilized for the means of the regression coefficients to induce clustering. Through extensive simulation studies, we show that BClustLonG has improved performance over other clustering methods. When applied to a dataset of severely injured (burn or trauma) patients, our model is able to identify interesting subgroups. Copyright © 2017 John Wiley & Sons, Ltd.

Keywords: Bayesian factor analysis; Bayesian nonparametrics; clustering; longitudinal gene expression study.

Publication types

  • Comparative Study

MeSH terms

  • Bayes Theorem*
  • Burns
  • Cluster Analysis*
  • Computer Simulation
  • Factor Analysis, Statistical*
  • Gene Expression
  • Gene Expression Profiling / methods*
  • Humans
  • Markov Chains
  • Models, Genetic*
  • Monte Carlo Method
  • Regression Analysis*
  • Statistics, Nonparametric