MSFF-CDCGAN: A novel method to predict RNA secondary structure based on Generative Adversarial Network

Methods. 2022 Aug:204:368-375. doi: 10.1016/j.ymeth.2022.04.004. Epub 2022 Apr 28.

Abstract

Access to RNA secondary structure is a prerequisite for understanding and mastering RNA function. RNA secondary structures play an important role in cells, they can cause or contribute to neurological disorders and can be applied in the medical field. However, the experimental method to obtain RNA secondary structure is costly, laborious and not universal. Although computational methods can predict RNA secondary structure more accurately for short-sequence RNAs, it cannot predict long-sequence RNAs and pseudoknot, which is the bottleneck of RNA secondary structure prediction at present. In recent years, researchers have attempted to use deep learning algorithms to predict RNA secondary structure and have achieved results. However, the small amount of data on the secondary structure of long-sequence RNAs leads to the low accuracy of deep learning methods to predict the secondary structure of RNAs across races. Similarly, RNA structure with pseudoknot is very complex and insufficient data caused the deep learning algorithm to struggle to predict the secondary structure of RNA containing pseudoknots. The RNA data are encoded into grayscale images by a unique encoding method based on the real RNA secondary structure and sequence information. Then, this paper reasonably expands the image data to increase the amount of RNA data, solves the problem of insufficient data for predicting long sequences and RNA secondary structure with pseudoknots in current deep learning methods, and provides a good data foundation for deep learning.The article proposes a multi-scale feature fusion Conditional Deep Convolutional Generative Adversarial Network prediction model (MSFF-CDCGAN) based on the improved Conditional Deep Convolutional Generative Adversarial Network (CDCGAN) model to predict RNA secondary structure. The experimental results showed that the MSFF-CDCGAN model could predict long-sequence RNAs and pseudoknots more accurately than traditional prediction methods. This paper introduces Generative Adversarial Network (GAN) to RNA secondary structure prediction for the first time. It uses a unique image encoding approach to expand the original RNA data set, thus transforming the structure prediction problem into an image analysis problem and effectively solving the bottleneck in RNA secondary structure prediction.

Keywords: CDCGAN; Deep Learning; Generative Adversarial Network; RNA Secondary Structure Prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Image Processing, Computer-Assisted
  • Protein Structure, Secondary
  • RNA* / chemistry
  • RNA* / genetics
  • Sequence Analysis, RNA / methods

Substances

  • RNA