GAERF: predicting lncRNA-disease associations by graph auto-encoder and random forest

Brief Bioinform. 2021 Sep 2;22(5):bbaa391. doi: 10.1093/bib/bbaa391.

Abstract

Predicting disease-related long non-coding RNAs (lncRNAs) is beneficial to finding of new biomarkers for prevention, diagnosis and treatment of complex human diseases. In this paper, we proposed a machine learning techniques-based classification approach to identify disease-related lncRNAs by graph auto-encoder (GAE) and random forest (RF) (GAERF). First, we combined the relationship of lncRNA, miRNA and disease into a heterogeneous network. Then, low-dimensional representation vectors of nodes were learned from the network by GAE, which reduce the dimension and heterogeneity of biological data. Taking these feature vectors as input, we trained a RF classifier to predict new lncRNA-disease associations (LDAs). Related experiment results show that the proposed method for the representation of lncRNA-disease characterizes them accurately. GAERF achieves superior performance owing to the ensemble learning method, outperforming other methods significantly. Moreover, case studies further demonstrated that GAERF is an effective method to predict LDAs.

Keywords: graph auto-encoder; graph convolutional network; graph embedding; lncRNA-disease association; random forest.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers, Tumor / genetics
  • Biomarkers, Tumor / metabolism
  • Computational Biology / methods
  • Computer Graphics / statistics & numerical data
  • Decision Trees
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Lung Neoplasms / diagnosis
  • Lung Neoplasms / genetics*
  • Lung Neoplasms / metabolism
  • Lung Neoplasms / pathology
  • Machine Learning*
  • Male
  • MicroRNAs / classification
  • MicroRNAs / genetics
  • MicroRNAs / metabolism
  • Neural Networks, Computer*
  • Prostatic Neoplasms / diagnosis
  • Prostatic Neoplasms / genetics*
  • Prostatic Neoplasms / metabolism
  • Prostatic Neoplasms / pathology
  • RNA, Long Noncoding / classification
  • RNA, Long Noncoding / genetics*
  • RNA, Long Noncoding / metabolism
  • ROC Curve
  • Risk Factors
  • Stomach Neoplasms / diagnosis
  • Stomach Neoplasms / genetics*
  • Stomach Neoplasms / metabolism
  • Stomach Neoplasms / pathology

Substances

  • Biomarkers, Tumor
  • MicroRNAs
  • RNA, Long Noncoding