Adaptively capturing the heterogeneity of expression for cancer biomarker identification

BMC Bioinformatics. 2018 Nov 3;19(1):401. doi: 10.1186/s12859-018-2437-2.

Abstract

Background: Identifying cancer biomarkers from transcriptomics data is of importance to cancer research. However, transcriptomics data are often complex and heterogeneous, which complicates the identification of cancer biomarkers in practice. Currently, the heterogeneity still remains a challenge for detecting subtle but consistent changes of gene expression in cancer cells.

Results: In this paper, we propose to adaptively capture the heterogeneity of expression across samples in a gene regulation space instead of in a gene expression space. Specifically, we transform gene expression profiles into gene regulation profiles and mathematically formulate gene regulation probabilities (GRPs)-based statistics for characterizing differential expression of genes between tumor and normal tissues. Finally, an unbiased estimator (aGRP) of GRPs is devised that can interrogate and adaptively capture the heterogeneity of gene expression. We also derived an asymptotical significance analysis procedure for the new statistic. Since no parameter needs to be preset, aGRP is easy and friendly to use for researchers without computer programming background. We evaluated the proposed method on both simulated data and real-world data and compared with previous methods. Experimental results demonstrated the superior performance of the proposed method in exploring the heterogeneity of expression for capturing subtle but consistent alterations of gene expression in cancer.

Conclusions: Expression heterogeneity largely influences the performance of cancer biomarker identification from transcriptomics data. Models are needed that efficiently deal with the expression heterogeneity. The proposed method can be a standalone tool due to its capacity of adaptively capturing the sample heterogeneity and the simplicity in use.

Software availability: The source code of aGRP can be downloaded from https://github.com/hqwang126/aGRP .

Keywords: Cancer biomarkers; Differential expression; Expression complexity; Regulation probability; Transcriptomics data.

MeSH terms

  • Biomarkers, Tumor / genetics*
  • Computer Simulation
  • Gene Expression Profiling
  • Gene Expression Regulation, Neoplastic*
  • Genetic Heterogeneity*
  • Humans
  • Models, Genetic
  • Neoplasms / genetics*
  • Oligonucleotide Array Sequence Analysis
  • Probability
  • Sequence Analysis, RNA
  • Software
  • Transcriptome

Substances

  • Biomarkers, Tumor