A semi-parametric statistical model for integrating gene expression profiles across different platforms

BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):5. doi: 10.1186/s12859-015-0847-y.

Abstract

Background: Determining differentially expressed genes (DEGs) between biological samples is the key to understand how genotype gives rise to phenotype. RNA-seq and microarray are two main technologies for profiling gene expression levels. However, considerable discrepancy has been found between DEGs detected using the two technologies. Integration data across these two platforms has the potential to improve the power and reliability of DEG detection.

Methods: We propose a rank-based semi-parametric model to determine DEGs using information across different sources and apply it to the integration of RNA-seq and microarray data. By incorporating both the significance of differential expression and the consistency across platforms, our method effectively detects DEGs with moderate but consistent signals. We demonstrate the effectiveness of our method using simulation studies, MAQC/SEQC data and a synthetic microRNA dataset.

Conclusions: Our integration method is not only robust to noise and heterogeneity in the data, but also adaptive to the structure of data. In our simulations and real data studies, our approach shows a higher discriminate power and identifies more biologically relevant DEGs than eBayes, DEseq and some commonly used meta-analysis methods.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Expression Profiling / methods
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Models, Statistical*
  • Oligonucleotide Array Sequence Analysis / methods*
  • RNA / genetics*
  • Reproducibility of Results
  • Sequence Analysis, RNA / methods*
  • Transcriptome*

Substances

  • RNA