Ensemble of Deep Recurrent Neural Networks for Identifying Enhancers via Dinucleotide Physicochemical Properties

Kok Keng Tan; Nguyen Quoc Khanh Le; Hui-Yuan Yeh; Matthew Chin Heng Chua

doi:10.3390/cells8070767

Ensemble of Deep Recurrent Neural Networks for Identifying Enhancers via Dinucleotide Physicochemical Properties

Cells. 2019 Jul 23;8(7):767. doi: 10.3390/cells8070767.

Authors

Kok Keng Tan¹, Nguyen Quoc Khanh Le², Hui-Yuan Yeh³, Matthew Chin Heng Chua⁴

Affiliations

¹ Institute of Systems Science, 25 Heng Mui Keng Terrace, National University of Singapore, Singapore 119615, Singapore.
² Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore 639798, Singapore.
³ Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore 639798, Singapore. hyyeh@ntu.edu.sg.
⁴ Institute of Systems Science, 25 Heng Mui Keng Terrace, National University of Singapore, Singapore 119615, Singapore. mattchua@nus.edu.sg.

Abstract

Enhancers are short deoxyribonucleic acid fragments that assume an important part in the genetic process of gene expression. Due to their possibly distant location relative to the gene that is acted upon, the identification of enhancers is difficult. There are many published works focused on identifying enhancers based on their sequence information, however, the resulting performance still requires improvements. Using deep learning methods, this study proposes a model ensemble of classifiers for predicting enhancers based on deep recurrent neural networks. The input features of deep ensemble networks were generated from six types of dinucleotide physicochemical properties, which had outperformed the other features. In summary, our model which used this ensemble approach could identify enhancers with achieved sensitivity of 75.5%, specificity of 76%, accuracy of 75.5%, and MCC of 0.51. For classifying enhancers into strong or weak sequences, our model reached sensitivity of 83.15%, specificity of 45.61%, accuracy of 68.49%, and MCC of 0.312. Compared to the benchmark result, our results had higher performance in term of most measurement metrics. The results showed that deep model ensembles hold the potential for improving on the best results achieved to date using shallow machine learning methods.

Keywords: biocomputing; dinucleotide physicochemical properties; enhancer DNA; ensemble deep learning; gene expression; high performance; transcription factor.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Computational Biology
DNA / chemistry*
Databases, Genetic
Datasets as Topic
Dinucleoside Phosphates / chemistry*
Enhancer Elements, Genetic*
Machine Learning
Neural Networks, Computer

Substances

Dinucleoside Phosphates
DNA