Clustering of gene expression data via normal mixture models

Methods Mol Biol. 2013:972:103-19. doi: 10.1007/978-1-60327-337-4_7.

Abstract

There are two distinct but related clustering problems with microarray data. One problem concerns the clustering of the tissue samples (gene signatures) on the basis of the genes; the other concerns the clustering of the genes on the basis of the tissues (gene profiles). The clusters of tissues so obtained in the first problem can play a useful role in the discovery and understanding of new subclasses of diseases. The clusters of genes obtained in the second problem can be used to search for genetic pathways or groups of genes that might be regulated together. Also, in the first problem, we may wish first to summarize the information in the very large number of genes by clustering them into groups (of hyperspherical shape), which can be represented by some metagenes, such as the group sample means. We can then carry out the clustering of the tissues in terms of these metagenes. We focus here on mixtures of normals to provide a model-based clustering of tissue samples (gene signatures) and of gene profiles.

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Cluster Analysis
  • Data Interpretation, Statistical*
  • Gene Expression Profiling / methods*
  • Humans
  • Linear Models
  • Male
  • Normal Distribution
  • Oligonucleotide Array Sequence Analysis / methods*
  • Prostatic Neoplasms / genetics
  • Transcriptome