Prediction of potential miRNA-disease associations based on stacked autoencoder

Brief Bioinform. 2022 Mar 10;23(2):bbac021. doi: 10.1093/bib/bbac021.

Abstract

In recent years, increasing biological experiments and scientific studies have demonstrated that microRNA (miRNA) plays an important role in the development of human complex diseases. Therefore, discovering miRNA-disease associations can contribute to accurate diagnosis and effective treatment of diseases. Identifying miRNA-disease associations through computational methods based on biological data has been proven to be low-cost and high-efficiency. In this study, we proposed a computational model named Stacked Autoencoder for potential MiRNA-Disease Association prediction (SAEMDA). In SAEMDA, all the miRNA-disease samples were used to pretrain a Stacked Autoencoder (SAE) in an unsupervised manner. Then, the positive samples and the same number of selected negative samples were utilized to fine-tune SAE in a supervised manner after adding an output layer with softmax classifier to the SAE. SAEMDA can make full use of the feature information of all unlabeled miRNA-disease pairs. Therefore, SAEMDA is suitable for our dataset containing small labeled samples and large unlabeled samples. As a result, SAEMDA achieved AUCs of 0.9210 and 0.8343 in global and local leave-one-out cross validation. Besides, SAEMDA obtained an average AUC and standard deviation of 0.9102 ± /-0.0029 in 100 times of 5-fold cross validation. These results were better than those of previous models. Moreover, we carried out three case studies to further demonstrate the predictive accuracy of SAEMDA. As a result, 82% (breast neoplasms), 100% (lung neoplasms) and 90% (esophageal neoplasms) of the top 50 predicted miRNAs were verified by databases. Thus, SAEMDA could be a useful and reliable model to predict potential miRNA-disease associations.

Keywords: Stacked Autoencoder; association prediction; disease; fine-tuning; microRNA; pretraining.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Breast Neoplasms*
  • Computational Biology / methods
  • Female
  • Genetic Predisposition to Disease
  • Humans
  • Lung Neoplasms* / genetics
  • MicroRNAs* / genetics

Substances

  • MicroRNAs