A probabilistic knowledge graph for target identification

PLoS Comput Biol. 2024 Apr 5;20(4):e1011945. doi: 10.1371/journal.pcbi.1011945. eCollection 2024 Apr.

Abstract

Early identification of safe and efficacious disease targets is crucial to alleviating the tremendous cost of drug discovery projects. However, existing experimental methods for identifying new targets are generally labor-intensive and failure-prone. On the other hand, computational approaches, especially machine learning-based frameworks, have shown remarkable application potential in drug discovery. In this work, we propose Progeni, a novel machine learning-based framework for target identification. In addition to fully exploiting the known heterogeneous biological networks from various sources, Progeni integrates literature evidence about the relations between biological entities to construct a probabilistic knowledge graph. Graph neural networks are then employed in Progeni to learn the feature embeddings of biological entities to facilitate the identification of biologically relevant target candidates. A comprehensive evaluation of Progeni demonstrated its superior predictive power over the baseline methods on the target identification task. In addition, our extensive tests showed that Progeni exhibited high robustness to the negative effect of exposure bias, a common phenomenon in recommendation systems, and effectively identified new targets that can be strongly supported by the literature. Moreover, our wet lab experiments successfully validated the biological significance of the top target candidates predicted by Progeni for melanoma and colorectal cancer. All these results suggested that Progeni can identify biologically effective targets and thus provide a powerful and useful tool for advancing the drug discovery process.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Colorectal Neoplasms
  • Computational Biology* / methods
  • Drug Discovery* / methods
  • Humans
  • Machine Learning*
  • Melanoma
  • Neural Networks, Computer*
  • Probability

Grants and funding

This work was supported in part by the National Natural Science Foundation of China (T2125007 to J.Z., 32270640 to D.Z., and 82073161, 32270982, 82241234 to H.T.), the National Key Research and Development Program of China (2021YFF1201300 to J.Z.), the New Cornerstone Science Foundation through the XPLORER PRIZE (J.Z.), the Research Center for Industries of the Future (RCIF) at Westlake University (J.Z.), the Westlake Education Foundation (J.Z.), the "Pioneer" and "Leading Goose" R&D Program of Zhejiang (2024SSYS0036), the National Youth Talent Support Program (to H.T.), the Senior and Junior Technological Innovation Team (20210509055RQ), the Fundamental Research Funds for the Central Universities, JLU and the Jilin Provincial Key Laboratory of Big Data Intelligent Computing (20180622002JC). The funders played no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.