Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models

Bioinformatics. 2015 May 1;31(9):1420-7. doi: 10.1093/bioinformatics/btu845. Epub 2015 Jan 5.

Abstract

Motivation: In recent years, gene expression studies have increasingly made use of high-throughput sequencing technology. In turn, research concerning the appropriate statistical methods for the analysis of digital gene expression (DGE) has flourished, primarily in the context of normalization and differential analysis.

Results: In this work, we focus on the question of clustering DGE profiles as a means to discover groups of co-expressed genes. We propose a Poisson mixture model using a rigorous framework for parameter estimation as well as the choice of the appropriate number of clusters. We illustrate co-expression analyses using our approach on two real RNA-seq datasets. A set of simulation studies also compares the performance of the proposed model with that of several related approaches developed to cluster RNA-seq or serial analysis of gene expression data.

Availability and and implementation: The proposed method is implemented in the open-source R package HTSCluster, available on CRAN.

Contact: andrea.rau@jouy.inra.fr

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Cell Line
  • Cluster Analysis
  • Drosophila melanogaster / embryology
  • Drosophila melanogaster / genetics
  • Gene Expression Profiling / methods*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Liver / metabolism
  • Models, Statistical
  • Poisson Distribution
  • Sequence Analysis, RNA / methods*