From Gigabyte to Kilobyte: A Bioinformatics Protocol for Mining Large RNA-Seq Transcriptomics Data

PLoS One. 2015 Apr 22;10(4):e0125000. doi: 10.1371/journal.pone.0125000. eCollection 2015.

Abstract

RNA-Seq techniques generate hundreds of millions of short RNA reads using next-generation sequencing (NGS). These RNA reads can be mapped to reference genomes to investigate changes of gene expression but improved procedures for mining large RNA-Seq datasets to extract valuable biological knowledge are needed. RNAMiner--a multi-level bioinformatics protocol and pipeline--has been developed for such datasets. It includes five steps: Mapping RNA-Seq reads to a reference genome, calculating gene expression values, identifying differentially expressed genes, predicting gene functions, and constructing gene regulatory networks. To demonstrate its utility, we applied RNAMiner to datasets generated from Human, Mouse, Arabidopsis thaliana, and Drosophila melanogaster cells, and successfully identified differentially expressed genes, clustered them into cohesive functional groups, and constructed novel gene regulatory networks. The RNAMiner web service is available at http://calla.rnet.missouri.edu/rnaminer/index.html.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Arabidopsis / genetics
  • Computational Biology / methods*
  • Data Mining*
  • Databases, Genetic
  • Drosophila melanogaster / genetics
  • Gene Expression Profiling*
  • Gene Regulatory Networks
  • Genome
  • Humans
  • Internet
  • Mice
  • Sequence Analysis, RNA / methods*
  • Software*
  • Statistics as Topic*