SMALF: miRNA-disease associations prediction based on stacked autoencoder and XGBoost

BMC Bioinformatics. 2021 Apr 28;22(1):219. doi: 10.1186/s12859-021-04135-2.

Abstract

Background: Identifying miRNA and disease associations helps us understand disease mechanisms of action from the molecular level. However, it is usually blind, time-consuming, and small-scale based on biological experiments. Hence, developing computational methods to predict unknown miRNA and disease associations is becoming increasingly important.

Results: In this work, we develop a computational framework called SMALF to predict unknown miRNA-disease associations. SMALF first utilizes a stacked autoencoder to learn miRNA latent feature and disease latent feature from the original miRNA-disease association matrix. Then, SMALF obtains the feature vector of representing miRNA-disease by integrating miRNA functional similarity, miRNA latent feature, disease semantic similarity, and disease latent feature. Finally, XGBoost is utilized to predict unknown miRNA-disease associations. We implement cross-validation experiments. Compared with other state-of-the-art methods, SAMLF achieved the best AUC value. We also construct three case studies, including hepatocellular carcinoma, colon cancer, and breast cancer. The results show that 10, 10, and 9 out of the top ten predicted miRNAs are verified in MNDR v3.0 or miRCancer, respectively.

Conclusion: The comprehensive experimental results demonstrate that SMALF is effective in identifying unknown miRNA-disease associations.

Keywords: Latent feature; Stacked autoencoder; XGBoost; miRNA-disease associations.

MeSH terms

  • Algorithms
  • Breast Neoplasms* / genetics
  • Computational Biology
  • Humans
  • MicroRNAs* / genetics

Substances

  • MicroRNAs