Predicting RNA secondary structure via adaptive deep recurrent neural networks with energy-based filter

Weizhong Lu; Ye Tang; Hongjie Wu; Hongmei Huang; Qiming Fu; Jing Qiu; Haiou Li

doi:10.1186/s12859-019-3258-7

Predicting RNA secondary structure via adaptive deep recurrent neural networks with energy-based filter

BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):684. doi: 10.1186/s12859-019-3258-7.

Authors

Weizhong Lu¹, Ye Tang¹, Hongjie Wu^{2

3}, Hongmei Huang¹, Qiming Fu¹, Jing Qiu¹, Haiou Li¹

Affiliations

¹ School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China.
² School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiang, 215000, China. Hongjie.wu@qq.com.
³ Anhui Key Laboratory of Intelligent Building Energy Efficiency, Anhui Jianzhu University, Hefei, Anhui, 230601, China. Hongjie.wu@qq.com.

Abstract

Background: RNA secondary structure prediction is an important issue in structural bioinformatics, and RNA pseudoknotted secondary structure prediction represents an NP-hard problem. Recently, many different machine-learning methods, Markov models, and neural networks have been employed for this problem, with encouraging results regarding their predictive accuracy; however, their performances are usually limited by the requirements of the learning model and over-fitting, which requires use of a fixed number of training features. Because most natural biological sequences have variable lengths, the sequences have to be truncated before the features are employed by the learning model, which not only leads to the loss of information but also destroys biological-sequence integrity.

Results: To address this problem, we propose an adaptive sequence length based on deep-learning model and integrate an energy-based filter to remove the over-fitting base pairs.

Conclusions: Comparative experiments conducted on an authoritative dataset RNA STRAND (RNA secondary STRucture and statistical Analysis Database) revealed a 12% higher accuracy relative to three currently used methods.

Keywords: LSTM; Pseudoknots; RNA; Recurrent neural network; Secondary structure prediction.

MeSH terms

Base Pairing
Neural Networks, Computer*
Nucleic Acid Conformation
RNA / chemistry*
Thermodynamics

Substances

RNA