The computational approaches of lncRNA identification based on coding potential: Status quo and challenges

Jing Li; Xuan Zhang; Changning Liu

doi:10.1016/j.csbj.2020.11.030

The computational approaches of lncRNA identification based on coding potential: Status quo and challenges

Comput Struct Biotechnol J. 2020 Nov 19:18:3666-3677. doi: 10.1016/j.csbj.2020.11.030. eCollection 2020.

Authors

Jing Li^{1

2}, Xuan Zhang¹, Changning Liu^{1

2

3}

Affiliations

¹ CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China.
² Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China.
³ The Innovative Academy of Seed Design, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China.

Abstract

Long noncoding RNAs (lncRNAs) make up a large proportion of transcriptome in eukaryotes, and have been revealed with many regulatory functions in various biological processes. When studying lncRNAs, the first step is to accurately and specifically distinguish them from the colossal transcriptome data with complicated composition, which contains mRNAs, lncRNAs, small RNAs and their primary transcripts. In the face of such a huge and progressively expanding transcriptome data, the in-silico approaches provide a practicable scheme for effectively and rapidly filtering out lncRNA targets, using machine learning and probability statistics. In this review, we mainly discussed the characteristics of algorithms and features on currently developed approaches. We also outlined the traits of some state-of-the-art tools for ease of operation. Finally, we pointed out the underlying challenges in lncRNA identification with the advent of new experimental data.

Keywords: Algorithm; Coding potential; Feature; In sillico; LncRNA identification; sORF.

Publication types

Review