OMICfpp: a fuzzy approach for paired RNA-Seq counts

BMC Genomics. 2019 Apr 2;20(1):259. doi: 10.1186/s12864-019-5496-5.

Abstract

Background: RNA sequencing is a widely used technology for differential expression analysis. However, the RNA-Seq do not provide accurate absolute measurements and the results can be different for each pipeline used. The major problem in statistical analysis of RNA-Seq and in the omics data in general, is the small sample size with respect to the large number of variables. In addition, experimental design must be taken into account and few tools consider it.

Results: We propose OMICfpp, a method for the statistical analysis of RNA-Seq paired design data. First, we obtain a p-value for each case-control pair using a binomial test. These p-values are aggregated using an ordered weighted average (OWA) with a given orness previously chosen. The aggregated p-value from the original data is compared with the aggregated p-value obtained using the same method applied to random pairs. These new pairs are generated using between-pairs and complete randomization distributions. This randomization p-value is used as a raw p-value to test the differential expression of each gene. The OMICfpp method is evaluated using public data sets of 68 sample pairs from patients with colorectal cancer. We validate our results through bibliographic search of the reported genes and using simulated data set. Furthermore, we compared our results with those obtained by the methods edgeR and DESeq2 for paired samples. Finally, we propose new target genes to validate these as gene expression signatures in colorectal cancer. OMICfpp is available at http://www.uv.es/ayala/software/OMICfpp_0.2.tar.gz .

Conclusions: Our study shows that OMICfpp is an accurate method for differential expression analysis in RNA-Seq data with paired design. In addition, we propose the use of randomized p-values pattern graphic as a powerful and robust method to select the target genes for experimental validation.

Keywords: Colorectal cancer; Ordered weight average; Randomization distribution.

MeSH terms

  • Colorectal Neoplasms / genetics
  • Colorectal Neoplasms / pathology
  • High-Throughput Nucleotide Sequencing
  • Humans
  • RNA / chemistry
  • RNA / metabolism*
  • Sequence Analysis, RNA / methods*
  • Transcriptome
  • User-Computer Interface*

Substances

  • RNA