Comparative analysis of protein-coding and long non-coding transcripts based on RNA sequence features

Oxana A Volkova; Yury V Kondrakhin; Timur A Kashapov; Ruslan N Sharipov

doi:10.1142/S0219720018400139

Comparative analysis of protein-coding and long non-coding transcripts based on RNA sequence features

J Bioinform Comput Biol. 2018 Apr;16(2):1840013. doi: 10.1142/S0219720018400139.

Authors

Oxana A Volkova¹, Yury V Kondrakhin^{2

3}, Timur A Kashapov³, Ruslan N Sharipov^{3

4}

Affiliations

¹ * Laboratory of Gene Engineering, The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, Prosp. Acad. Lavrentyeva, 10, Novosibirsk 630090, Russia.
² † Laboratory of Bioinformatics, Institute of Computational Technologies, The Siberian Branch of the Russian Academy of Sciences, Ul. Acad. Rzhanova, 6, Novosibirsk 630090, Russia.
³ ‡ BIOSOFT.RU, Ltd, Ul. Russkaya, 41/1 Novosibirsk 630058, Russia.
⁴ § Novosibirsk State University, Ul. Pirogova, 2, Novosibirsk 630090, Russia.

PMID: 29739305
DOI: 10.1142/S0219720018400139

Abstract

RNA plays an important role in the intracellular cell life and in the organism in general. Besides the well-established protein coding RNAs (messenger RNAs, mRNAs), long non-coding RNAs (lncRNAs) have gained the attention of recent researchers. Although lncRNAs have been classified as non-coding, some authors reported the presence of corresponding sequences in ribosome profiling data (Ribo-seq). Ribo-seq technology is a powerful experimental tool utilized to characterize RNA translation in cell with focus on initiation (harringtonine, lactimidomycin) and elongation (cycloheximide). By exploiting translation starts obtained from the Ribo-seq experiment, we developed a novel position weight matrix model for the prediction of translation starts. This model allowed us to achieve 96% accuracy of discrimination between human mRNAs and lncRNAs. When the same model was used for the prediction of putative ORFs in RNAs, we discovered that the majority of lncRNAs contained only small ORFs ([Formula: see text][Formula: see text]nt) in contrast to mRNAs.

Keywords: Human mRNAs; IPSmatrix algorithm; discriminant analysis; human lncRNAs; position weight matrix approach; small ORFs.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

3' Untranslated Regions
5' Untranslated Regions
Algorithms
Computational Biology / methods*
Open Reading Frames
Protein Biosynthesis
Proteins / genetics*
RNA, Long Noncoding*
RNA, Messenger / genetics
Ribosomes / genetics
Sequence Analysis, RNA

Substances

3' Untranslated Regions
5' Untranslated Regions
Proteins
RNA, Long Noncoding
RNA, Messenger