The identification and characterization of novel transcripts from RNA-seq data

Brief Bioinform. 2016 Jul;17(4):678-85. doi: 10.1093/bib/bbv067. Epub 2015 Aug 17.

Abstract

Owing greatly to the advancement of next-generation sequencing (NGS), the amount of NGS data is increasing rapidly. Although there are many NGS applications, one of the most commonly used techniques 'RNA sequencing (RNA-seq)' is rapidly replacing microarray-based techniques in laboratories around the world. As more and more of such techniques are standardized, allowing technicians to perform these experiments with minimal hands-on time and reduced experimental/operator-dependent biases, the bottleneck of such techniques is clearly visible; that is, data analysis. Further complicating the matter, increasing evidence suggests most of the genome is transcribed into RNA; however, the majority of these RNAs are not translated into proteins. These RNAs that do not become proteins are called 'noncoding RNAs (ncRNAs)'. Although some time has passed since the discovery of ncRNAs, their annotations remain poor, making analysis of RNA-seq data challenging. Here, we examine the current limitations of RNA-seq analysis using case studies focused on the detection of novel transcripts and examination of their characteristics. Finally, we validate the presence of novel transcripts using biological experiments, showing novel transcripts can be accurately identified when a series of filters is applied. In conclusion, novel transcripts that are identified from RNA-seq must be examined carefully before proceeding to biological experiments.

Keywords: RNA-seq; gene expression; lncRNA; novel transcripts; transcriptome assembly.

MeSH terms

  • Base Sequence
  • High-Throughput Nucleotide Sequencing
  • RNA / genetics*
  • Sequence Analysis, RNA
  • Transcriptome

Substances

  • RNA