Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs

Int J Mol Sci. 2021 Aug 13;22(16):8719. doi: 10.3390/ijms22168719.

Abstract

Apart from protein-coding Ribonucleic acids (RNAs), there exists a variety of non-coding RNAs (ncRNAs) which regulate complex cellular and molecular processes. High-throughput sequencing technologies and bioinformatics approaches have largely promoted the exploration of ncRNAs which revealed their crucial roles in gene regulation, miRNA binding, protein interactions, and splicing. Furthermore, ncRNAs are involved in the development of complicated diseases like cancer. Categorization of ncRNAs is essential to understand the mechanisms of diseases and to develop effective treatments. Sub-cellular localization information of ncRNAs demystifies diverse functionalities of ncRNAs. To date, several computational methodologies have been proposed to precisely identify the class as well as sub-cellular localization patterns of RNAs). This paper discusses different types of ncRNAs, reviews computational approaches proposed in the last 10 years to distinguish coding-RNA from ncRNA, to identify sub-types of ncRNAs such as piwi-associated RNA, micro RNA, long ncRNA, and circular RNA, and to determine sub-cellular localization of distinct ncRNAs and RNAs. Furthermore, it summarizes diverse ncRNA classification and sub-cellular localization determination datasets along with benchmark performance to aid the development and evaluation of novel computational methodologies. It identifies research gaps, heterogeneity, and challenges in the development of computational approaches for RNA sequence analysis. We consider that our expert analysis will assist Artificial Intelligence researchers with knowing state-of-the-art performance, model selection for various tasks on one platform, dominantly used sequence descriptors, neural architectures, and interpreting inter-species and intra-species performance deviation.

Keywords: RNA sub-cellular localization; benchmark performance; benchmark sequence analysis datasets; computational sequence analysis; deep learning; long non-coding RNA; machine learning; ncRNA; non-coding RNA classification; small non-coding RNA.

Publication types

  • Review

MeSH terms

  • Animals
  • Artificial Intelligence
  • Computational Biology / methods*
  • Databases, Factual
  • High-Throughput Nucleotide Sequencing
  • Humans
  • RNA, Untranslated / classification*
  • RNA, Untranslated / genetics
  • RNA, Untranslated / metabolism*
  • Sequence Analysis, RNA
  • Tissue Distribution

Substances

  • RNA, Untranslated