GCNfold: A novel lightweight model with valid extractors for RNA secondary structure prediction

Comput Biol Med. 2023 Sep:164:107246. doi: 10.1016/j.compbiomed.2023.107246. Epub 2023 Jul 10.

Abstract

RNA secondary structure is essential for predicting the tertiary structure and understanding RNA function. Recent research tends to stack numerous modules to design large deep-learning models. This can increase the accuracy to more than 70%, as well as significant training costs and prediction efficiency. We proposed a model with three feature extractors called GCNfold. Structure Extractor utilizes a three-layer Graph Convolutional Network (GCN) to mine the structural information of RNA, such as stems, hairpin, and internal loops. Structure and Sequence Fusion embeds structural information into sequences with Transformer Encoders. Long-distance Dependency Extractor captures long-range pairwise relationships by UNet. The experiments indicate that GCNfold has a small number of parameters, a fast inference speed, and a high accuracy among all models with over 80% accuracy. Additionally, GCNfold-Small takes only 90ms to infer an RNA secondary structure and can achieve close to 90% accuracy on average. The GCNfold code is available on Github https://github.com/EnbinYang/GCNfold.

Keywords: Graph convolutional network; Knowledge and data integration; RNA secondary structure prediction; Transformer; UNet.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Protein Structure, Secondary
  • RNA* / genetics

Substances

  • RNA