A novel target convergence set based random walk with restart for prediction of potential LncRNA-disease associations

BMC Bioinformatics. 2019 Dec 3;20(1):626. doi: 10.1186/s12859-019-3216-4.

Abstract

Background: In recent years, lncRNAs (long-non-coding RNAs) have been proved to be closely related to the occurrence and development of many serious diseases that are seriously harmful to human health. However, most of the lncRNA-disease associations have not been found yet due to high costs and time complexity of traditional bio-experiments. Hence, it is quite urgent and necessary to establish efficient and reasonable computational models to predict potential associations between lncRNAs and diseases.

Results: In this manuscript, a novel prediction model called TCSRWRLD is proposed to predict potential lncRNA-disease associations based on improved random walk with restart. In TCSRWRLD, a heterogeneous lncRNA-disease network is constructed first by combining the integrated similarity of lncRNAs and the integrated similarity of diseases. And then, for each lncRNA/disease node in the newly constructed heterogeneous lncRNA-disease network, it will establish a node set called TCS (Target Convergence Set) consisting of top 100 disease/lncRNA nodes with minimum average network distances to these disease/lncRNA nodes having known associations with itself. Finally, an improved random walk with restart is implemented on the heterogeneous lncRNA-disease network to infer potential lncRNA-disease associations. The major contribution of this manuscript lies in the introduction of the concept of TCS, based on which, the velocity of convergence of TCSRWRLD can be quicken effectively, since the walker can stop its random walk while the walking probability vectors obtained by it at the nodes in TCS instead of all nodes in the whole network have reached stable state. And Simulation results show that TCSRWRLD can achieve a reliable AUC of 0.8712 in the Leave-One-Out Cross Validation (LOOCV), which outperforms previous state-of-the-art results apparently. Moreover, case studies of lung cancer and leukemia demonstrate the satisfactory prediction performance of TCSRWRLD as well.

Conclusions: Both comparative results and case studies have demonstrated that TCSRWRLD can achieve excellent performances in prediction of potential lncRNA-disease associations, which imply as well that TCSRWRLD may be a good addition to the research of bioinformatics in the future.

Keywords: Global set; Heterogeneous network; Potential lncRNA-disease association prediction; Random walk with restart; Target convergence set.

MeSH terms

  • Algorithms*
  • Area Under Curve
  • Computational Biology / methods*
  • Genetic Association Studies*
  • Genetic Predisposition to Disease*
  • Humans
  • Neoplasms / genetics
  • Probability
  • RNA, Long Noncoding / genetics*
  • RNA, Long Noncoding / metabolism
  • Reproducibility of Results

Substances

  • RNA, Long Noncoding