Predicting Human lncRNA-Disease Associations Based on Geometric Matrix Completion

IEEE J Biomed Health Inform. 2020 Aug;24(8):2420-2429. doi: 10.1109/JBHI.2019.2958389. Epub 2019 Dec 9.

Abstract

Recently, increasing evidences reveal that dysregulations of long non-coding RNAs (lncRNAs) are relevant to diverse diseases. However, the number of experimentally verified lncRNA-disease associations is limited. Prioritizing potential associations is beneficial not only for disease diagnosis, but also disease treatment, more important apprehending disease mechanisms at lncRNA level. Various computational methods have been proposed, but precise prediction and full use of data's intrinsic structure are still challenging. In this work, we design a new method, denominated GMCLDA (Geometric Matrix Completion lncRNA-Disease Association), to infer underlying associations based on geometric matrix completion. Utilizing association patterns among functionally similar lncRNAs and phenotypically similar diseases, GMCLCA makes use of the intrinsic structure embedded in the association matrix. Besides, limiting the scope of the predicted values gives rise to a certain sparsity in computation and enhances the robustness of GMCLDA. GMCLDA computes disease semantic similarity according to the Disease Ontology (DO) hierarchy and lncRNA Gaussian interaction profile kernel similarity according to known interaction profiles. Then, GMCLDA measures lncRNA sequence similarity using Needleman-Wunsch algorithm. For a new lncRNA, GMCLDA prefills interaction profile on account of its K-nearest neighbors defined by sequence similarity. Finally, GMCLDA estimates the missing entries of the association matrix based on geometric matrix completion model. Compared with state-of-the-art methods, GMCLDA can provide more accurate lncRNA-disease prediction. Further case studies prove that GMCLDA is able to correctly infer possible lncRNAs for renal cancer.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Databases, Factual
  • Female
  • Genetic Predisposition to Disease* / epidemiology
  • Genetic Predisposition to Disease* / genetics
  • Humans
  • Male
  • Medical Informatics
  • Neoplasms / epidemiology
  • Neoplasms / genetics
  • RNA, Long Noncoding / genetics*

Substances

  • RNA, Long Noncoding