Comparative analysis of protein-coding and long non-coding transcripts based on RNA sequence features

J Bioinform Comput Biol. 2018 Apr;16(2):1840013. doi: 10.1142/S0219720018400139.

Abstract

RNA plays an important role in the intracellular cell life and in the organism in general. Besides the well-established protein coding RNAs (messenger RNAs, mRNAs), long non-coding RNAs (lncRNAs) have gained the attention of recent researchers. Although lncRNAs have been classified as non-coding, some authors reported the presence of corresponding sequences in ribosome profiling data (Ribo-seq). Ribo-seq technology is a powerful experimental tool utilized to characterize RNA translation in cell with focus on initiation (harringtonine, lactimidomycin) and elongation (cycloheximide). By exploiting translation starts obtained from the Ribo-seq experiment, we developed a novel position weight matrix model for the prediction of translation starts. This model allowed us to achieve 96% accuracy of discrimination between human mRNAs and lncRNAs. When the same model was used for the prediction of putative ORFs in RNAs, we discovered that the majority of lncRNAs contained only small ORFs ([Formula: see text][Formula: see text]nt) in contrast to mRNAs.

Keywords: Human mRNAs; IPSmatrix algorithm; discriminant analysis; human lncRNAs; position weight matrix approach; small ORFs.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • 3' Untranslated Regions
  • 5' Untranslated Regions
  • Algorithms
  • Computational Biology / methods*
  • Open Reading Frames
  • Protein Biosynthesis
  • Proteins / genetics*
  • RNA, Long Noncoding*
  • RNA, Messenger / genetics
  • Ribosomes / genetics
  • Sequence Analysis, RNA

Substances

  • 3' Untranslated Regions
  • 5' Untranslated Regions
  • Proteins
  • RNA, Long Noncoding
  • RNA, Messenger