Knowledge graph embedding for profiling the interaction between transcription factors and their target genes

PLoS Comput Biol. 2023 Jun 20;19(6):e1011207. doi: 10.1371/journal.pcbi.1011207. eCollection 2023 Jun.

Abstract

Interactions between transcription factor and target gene form the main part of gene regulation network in human, which are still complicating factors in biological research. Specifically, for nearly half of those interactions recorded in established database, their interaction types are yet to be confirmed. Although several computational methods exist to predict gene interactions and their type, there is still no method available to predict them solely based on topology information. To this end, we proposed here a graph-based prediction model called KGE-TGI and trained in a multi-task learning manner on a knowledge graph that we specially constructed for this problem. The KGE-TGI model relies on topology information rather than being driven by gene expression data. In this paper, we formulate the task of predicting interaction types of transcript factor and target genes as a multi-label classification problem for link types on a heterogeneous graph, coupled with solving another link prediction problem that is inherently related. We constructed a ground truth dataset as benchmark and evaluated the proposed method on it. As a result of the 5-fold cross experiments, the proposed method achieved average AUC values of 0.9654 and 0.9339 in the tasks of link prediction and link type classification, respectively. In addition, the results of a series of comparison experiments also prove that the introduction of knowledge information significantly benefits to the prediction and that our methodology achieve state-of-the-art performance in this problem.

MeSH terms

  • Algorithms
  • Databases, Factual
  • Gene Ontology
  • Gene Regulatory Networks
  • Humans
  • Pattern Recognition, Automated*
  • Proteome
  • Systems Biology
  • Transcription Factors* / genetics

Substances

  • Transcription Factors
  • Proteome

Grants and funding

This work was supported by the National Key R&D Program of China (2020YFA0908700 (JQ L)), the National Nature Science Foundation of China (62176164 (ZH D)), the Natural Science Foundation of Guangdong Province (2023A1515010992 (ZH D)), the Science and Technology Innovation Committee Foundation of Shenzhen City (JCYJ20220531101217039 (ZH D)), the Shenzhen Scientific Research and Development Funding Program (GGFW2018020518310863 (ZH D)), the Guangdong "Pearl River Talent Recruitment Program" (2019ZT08X603 (VCM L)), the Guangdong "Pearl River Talent Plan" (2019JC01X235 (VCM L)), the Shenzhen Talents Special Project-Guangdong Provincial Innovation and Entrepreneurship Team Supporting Project (2021344612 (ZH D)) and the Shenzhen Science and Technology Innovation Commission (R2020A045 (ZH D)). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.