An Easy-to-Follow Pipeline for Long Noncoding RNA Identification: A Case Study in Diploid Strawberry Fragaria vesca

Methods Mol Biol. 2019:1933:223-243. doi: 10.1007/978-1-4939-9045-0_13.

Abstract

Long noncoding RNAs (lncRNAs), defined as transcripts longer than 200 nucleotides without coding potential, are a new class of regulatory molecules with roles in diverse biological processes. New lncRNAs can readily be identified by mining RNA-seq data from a wide range of plant species. However, challenges remain as to how one can distinguish functional lncRNAs from mRNAs coding for small peptides or products of pseudogenes without any function. In this chapter, stepwise instruction is provided using RNA-seq datasets of developing wild strawberry fruit to illustrate each step. The workflow can be divided into three parts. Part I concerns standard RNA-seq data processing and analysis; part II describes lncRNA identification; part III describes several approaches aimed at shedding lights on lncRNA function. The description is intended for beginners with easy-to-follow steps. Text boxes provide codes and explanations. While it is relatively easy to identify lncRNAs, it is difficult to infer their function in the absence of coding information. Multiple RNA-seq libraries across tissues and stages are useful resources for deducing possible function of lncRNAs based on their expression and co-regulation.

Keywords: Correlation analysis; Identification; RNA-seq; Strawberry; lncRNA.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Chromosomes, Plant
  • Computational Biology / methods*
  • Diploidy*
  • Fragaria / genetics*
  • Gene Expression Regulation, Plant*
  • High-Throughput Nucleotide Sequencing / methods*
  • RNA, Long Noncoding / genetics*
  • RNA, Plant / genetics*
  • Transcriptome

Substances

  • RNA, Long Noncoding
  • RNA, Plant