HetIG-PreDiG: A Heterogeneous Integrated Graph Model for Predicting Human Disease Genes based on gene expression

Kathleen M Jagodnik; Yael Shvili; Alon Bartal

doi:10.1371/journal.pone.0280839

HetIG-PreDiG: A Heterogeneous Integrated Graph Model for Predicting Human Disease Genes based on gene expression

PLoS One. 2023 Feb 15;18(2):e0280839. doi: 10.1371/journal.pone.0280839. eCollection 2023.

Authors

Kathleen M Jagodnik^{1

2

3}, Yael Shvili⁴, Alon Bartal¹

Affiliations

¹ The School of Business Administration, Bar-Ilan University, Ramat Gan, Israel.
² Department of Psychiatry, Harvard Medical School, Boston, MA, United States of America.
³ Department of Psychiatry, Massachusetts General Hospital, Boston, MA, United States of America.
⁴ Department of Surgery A, Meir Medical Center, Kfar Sava, Israel.

Abstract

Graph analytical approaches permit identifying novel genes involved in complex diseases, but are limited by (i) inferring structural network similarity of connected gene nodes, ignoring potentially relevant unconnected nodes; (ii) using homogeneous graphs, missing gene-disease associations' complexity; (iii) relying on disease/gene-phenotype associations' similarities, involving highly incomplete data; (iv) using binary classification, with gene-disease edges as positive training samples, and non-associated gene and disease nodes as negative samples that may include currently unknown disease genes; or (v) reporting predicted novel associations without systematically evaluating their accuracy. Addressing these limitations, we develop the Heterogeneous Integrated Graph for Predicting Disease Genes (HetIG-PreDiG) model that includes gene-gene, gene-disease, and gene-tissue associations. We predict novel disease genes using low-dimensional representation of nodes accounting for network structure, and extending beyond network structure using the developed Gene-Disease Prioritization Score (GDPS) reflecting the degree of gene-disease association via gene co-expression data. For negative training samples, we select non-associated gene and disease nodes with lower GDPS that are less likely to be affiliated. We evaluate the developed model's success in predicting novel disease genes by analyzing the prediction probabilities of gene-disease associations. HetIG-PreDiG successfully predicts (Micro-F1 = 0.95) gene-disease associations, outperforming baseline models, and is validated using published literature, thus advancing our understanding of complex genetic diseases.

Copyright: © 2023 Jagodnik et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Computational Biology* / methods
Gene Expression
Humans

Grants and funding

K.M.J. was supported by a Mortimer B. Zuckerman STEM Leadership Program post-doctoral fellowship in the School of Business Administration at Bar-Ilan University and in the Departments of Psychiatry at Harvard Medical School and Massachusetts General Hospital. We thank Bar-Ilan University’s Data Science Institute (DSI) for partially supporting this research.