Explaining protein-protein interactions with knowledge graph-based semantic similarity

Rita T Sousa; Sara Silva; Catia Pesquita

doi:10.1016/j.compbiomed.2024.108076

Explaining protein-protein interactions with knowledge graph-based semantic similarity

Comput Biol Med. 2024 Mar:170:108076. doi: 10.1016/j.compbiomed.2024.108076. Epub 2024 Feb 1.

Authors

Rita T Sousa¹, Sara Silva², Catia Pesquita²

Affiliations

¹ LASIGE, Faculdade de Ciências da Universidade de Lisboa, Lisboa, Portugal. Electronic address: risousa@fc.ul.pt.
² LASIGE, Faculdade de Ciências da Universidade de Lisboa, Lisboa, Portugal.

PMID: 38308873
DOI: 10.1016/j.compbiomed.2024.108076

Abstract

The application of artificial intelligence and machine learning methods for several biomedical applications, such as protein-protein interaction prediction, has gained significant traction in recent decades. However, explainability is a key aspect of using machine learning as a tool for scientific discovery. Explainable artificial intelligence approaches help clarify algorithmic mechanisms and identify potential bias in the data. Given the complexity of the biomedical domain, explanations should be grounded in domain knowledge which can be achieved by using ontologies and knowledge graphs. These knowledge graphs express knowledge about a domain by capturing different perspectives of the representation of real-world entities. However, the most popular way to explore knowledge graphs with machine learning is through using embeddings, which are not explainable. As an alternative, knowledge graph-based semantic similarity offers the advantage of being explainable. Additionally, similarity can be computed to capture different semantic aspects within the knowledge graph and increasing the explainability of predictive approaches. We propose a novel method to generate explainable vector representations, KGsim2vec, that uses aspect-oriented semantic similarity features to represent pairs of entities in a knowledge graph. Our approach employs a set of machine learning models, including decision trees, genetic programming, random forest and eXtreme gradient boosting, to predict relations between entities. The experiments reveal that considering multiple semantic aspects when representing the similarity between two entities improves explainability and predictive performance. KGsim2vec performs better than black-box methods based on knowledge graph embeddings or graph neural networks. Moreover, KGsim2vec produces global models that can capture biological phenomena and elucidate data biases.

Keywords: Explainable artificial intelligence; Knowledge graph; Machine learning; Protein–protein interaction prediction; Semantic similarity.

MeSH terms

Artificial Intelligence*
Machine Learning
Neural Networks, Computer
Pattern Recognition, Automated
Semantics*