Prediction of potential miRNA-disease associations based on stacked autoencoder

Chun-Chun Wang; Tian-Hao Li; Li Huang; Xing Chen

doi:10.1093/bib/bbac021

Prediction of potential miRNA-disease associations based on stacked autoencoder

Brief Bioinform. 2022 Mar 10;23(2):bbac021. doi: 10.1093/bib/bbac021.

Authors

Chun-Chun Wang^{1

2}, Tian-Hao Li¹, Li Huang^{3

4}, Xing Chen²

Affiliations

¹ School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
² Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou, 221116, China.
³ Academy of Arts and Design, Tsinghua University, Beijing, 10084, China.
⁴ The Future Laboratory, Tsinghua University, Beijing, 10084, China.

PMID: 35176761
DOI: 10.1093/bib/bbac021

Abstract

In recent years, increasing biological experiments and scientific studies have demonstrated that microRNA (miRNA) plays an important role in the development of human complex diseases. Therefore, discovering miRNA-disease associations can contribute to accurate diagnosis and effective treatment of diseases. Identifying miRNA-disease associations through computational methods based on biological data has been proven to be low-cost and high-efficiency. In this study, we proposed a computational model named Stacked Autoencoder for potential MiRNA-Disease Association prediction (SAEMDA). In SAEMDA, all the miRNA-disease samples were used to pretrain a Stacked Autoencoder (SAE) in an unsupervised manner. Then, the positive samples and the same number of selected negative samples were utilized to fine-tune SAE in a supervised manner after adding an output layer with softmax classifier to the SAE. SAEMDA can make full use of the feature information of all unlabeled miRNA-disease pairs. Therefore, SAEMDA is suitable for our dataset containing small labeled samples and large unlabeled samples. As a result, SAEMDA achieved AUCs of 0.9210 and 0.8343 in global and local leave-one-out cross validation. Besides, SAEMDA obtained an average AUC and standard deviation of 0.9102 ± /-0.0029 in 100 times of 5-fold cross validation. These results were better than those of previous models. Moreover, we carried out three case studies to further demonstrate the predictive accuracy of SAEMDA. As a result, 82% (breast neoplasms), 100% (lung neoplasms) and 90% (esophageal neoplasms) of the top 50 predicted miRNAs were verified by databases. Thus, SAEMDA could be a useful and reliable model to predict potential miRNA-disease associations.

Keywords: Stacked Autoencoder; association prediction; disease; fine-tuning; microRNA; pretraining.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Breast Neoplasms*
Computational Biology / methods
Female
Genetic Predisposition to Disease
Humans
Lung Neoplasms* / genetics
MicroRNAs* / genetics

Substances

MicroRNAs