Interpolation based consensus clustering for gene expression time series

BMC Bioinformatics. 2015 Apr 16:16:117. doi: 10.1186/s12859-015-0541-0.

Abstract

Background: Unsupervised analyses such as clustering are the essential tools required to interpret time-series expression data from microarrays. Several clustering algorithms have been developed to analyze gene expression data. Early methods such as k-means, hierarchical clustering, and self-organizing maps are popular for their simplicity. However, because of noise and uncertainty of measurement, these common algorithms have low accuracy. Moreover, because gene expression is a temporal process, the relationship between successive time points should be considered in the analyses. In addition, biological processes are generally continuous; therefore, the datasets collected from time series experiments are often found to have an insufficient number of data points and, as a result, compensation for missing data can also be an issue.

Results: An affinity propagation-based clustering algorithm for time-series gene expression data is proposed. The algorithm explores the relationship between genes using a sliding-window mechanism to extract a large number of features. In addition, the time-course datasets are resampled with spline interpolation to predict the unobserved values. Finally, a consensus process is applied to enhance the robustness of the method. Some real gene expression datasets were analyzed to demonstrate the accuracy and efficiency of the algorithm.

Conclusion: The proposed algorithm has benefitted from the use of cubic B-splines interpolation, sliding-window, affinity propagation, gene relativity graph, and a consensus process, and, as a result, provides both appropriate and effective clustering of time-series gene expression data. The proposed method was tested with gene expression data from the Yeast galactose dataset, the Yeast cell-cycle dataset (Y5), and the Yeast sporulation dataset, and the results illustrated the relationships between the expressed genes, which may give some insights into the biological processes involved.

MeSH terms

  • Algorithms*
  • Cell Cycle / physiology
  • Cluster Analysis
  • Computer Graphics*
  • Consensus Sequence
  • Galactose / metabolism
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Fungal*
  • Oligonucleotide Array Sequence Analysis / methods
  • Saccharomyces cerevisiae / genetics*
  • Saccharomyces cerevisiae Proteins / genetics*
  • Spores, Fungal / physiology
  • Time Factors

Substances

  • Saccharomyces cerevisiae Proteins
  • Galactose