The computational approaches of lncRNA identification based on coding potential: Status quo and challenges

Comput Struct Biotechnol J. 2020 Nov 19:18:3666-3677. doi: 10.1016/j.csbj.2020.11.030. eCollection 2020.

Abstract

Long noncoding RNAs (lncRNAs) make up a large proportion of transcriptome in eukaryotes, and have been revealed with many regulatory functions in various biological processes. When studying lncRNAs, the first step is to accurately and specifically distinguish them from the colossal transcriptome data with complicated composition, which contains mRNAs, lncRNAs, small RNAs and their primary transcripts. In the face of such a huge and progressively expanding transcriptome data, the in-silico approaches provide a practicable scheme for effectively and rapidly filtering out lncRNA targets, using machine learning and probability statistics. In this review, we mainly discussed the characteristics of algorithms and features on currently developed approaches. We also outlined the traits of some state-of-the-art tools for ease of operation. Finally, we pointed out the underlying challenges in lncRNA identification with the advent of new experimental data.

Keywords: Algorithm; Coding potential; Feature; In sillico; LncRNA identification; sORF.

Publication types

  • Review