Multiobjective triclustering of time-series transcriptome data reveals key genes of biological processes

BMC Bioinformatics. 2015 Jun 26:16:200. doi: 10.1186/s12859-015-0635-8.

Abstract

Background: Exploratory analysis of multi-dimensional high-throughput datasets, such as microarray gene expression time series, may be instrumental in understanding the genetic programs underlying numerous biological processes. In such datasets, variations in the gene expression profiles are usually observed across replicates and time points. Thus mining the temporal expression patterns in such multi-dimensional datasets may not only provide insights into the key biological processes governing organs to grow and develop but also facilitate the understanding of the underlying complex gene regulatory circuits.

Results: In this work we have developed an evolutionary multi-objective optimization for our previously introduced triclustering algorithm δ-TRIMAX. Its aim is to make optimal use of δ-TRIMAX in extracting groups of co-expressed genes from time series gene expression data, or from any 3D gene expression dataset, by adding the powerful capabilities of an evolutionary algorithm to retrieve overlapping triclusters. We have compared the performance of our newly developed algorithm, EMOA- δ-TRIMAX, with that of other existing triclustering approaches using four artificial dataset and three real-life datasets. Moreover, we have analyzed the results of our algorithm on one of these real-life datasets monitoring the differentiation of human induced pluripotent stem cells (hiPSC) into mature cardiomyocytes. For each group of co-expressed genes belonging to one tricluster, we identified key genes by computing their membership values within the tricluster. It turned out that to a very high percentage, these key genes were significantly enriched in Gene Ontology categories or KEGG pathways that fitted very well to the biological context of cardiomyocytes differentiation.

Conclusions: EMOA- δ-TRIMAX has proven instrumental in identifying groups of genes in transcriptomic data sets that represent the functional categories constituting the biological process under study. The executable file can be found at http://www.bioinf.med.uni-goettingen.de/fileadmin/download/EMOA-delta-TRIMAX.tar.gz .

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Biological Phenomena
  • Biomarkers / analysis*
  • Cell Differentiation / genetics*
  • Cluster Analysis
  • Datasets as Topic
  • Gene Expression Profiling / methods*
  • Gene Regulatory Networks
  • Humans
  • Induced Pluripotent Stem Cells / cytology
  • Induced Pluripotent Stem Cells / metabolism*
  • Myocytes, Cardiac / cytology
  • Myocytes, Cardiac / metabolism*
  • Oligonucleotide Array Sequence Analysis / methods
  • Time Factors
  • Transcriptome / genetics*

Substances

  • Biomarkers