NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification

IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):557-565. doi: 10.1109/TCBB.2021.3131136. Epub 2023 Feb 3.

Abstract

Short non-coding RNAs (sncRNAs) are involved in multiple cellular processes and can be divided into dozens of classes. Among such classes, Y RNAs have been gaining attention, being essential factors for the initiation of DNA replication on vertebrates, as well as potential tumor biomarkers. Homologs have also been described in nematodes and insects, as well as related sequences in bacteria. Methods capable of accurately predicting Y RNA transcripts are lacking. In this work, we developed an attention-based LSTM network and built a classification model able to classify sncRNAs (including Y RNA) directly from nucleotide sequences. A dataset consisting of 45,447 sncRNA sequences, from a wide range of organisms, obtained from Rfam 14.3 was built. Performance evaluation demonstrated that our proposed method, NCYPred (Non-Coding/Y RNA Prediction), can accurately predict Y RNA sequences and their homologs, as well as 11 additional classes, achieving results comparable with state-of-the-art methods. We also demonstrate that applying t-SNE on learned sequence representations could be useful for sequence analysis. Our model is freely available as a web-server (https://www.gpea.uem.br/ncypred/).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Bacteria / genetics
  • Computers
  • RNA, Small Untranslated* / genetics
  • Sequence Analysis, RNA

Substances

  • RNA, Small Untranslated