Ensembles of knowledge graph embedding models improve predictions for drug discovery

Brief Bioinform. 2022 Nov 19;23(6):bbac481. doi: 10.1093/bib/bbac481.

Abstract

Recent advances in Knowledge Graphs (KGs) and Knowledge Graph Embedding Models (KGEMs) have led to their adoption in a broad range of fields and applications. The current publishing system in machine learning requires newly introduced KGEMs to achieve state-of-the-art performance, surpassing at least one benchmark in order to be published. Despite this, dozens of novel architectures are published every year, making it challenging for users, even within the field, to deduce the most suitable configuration for a given application. A typical biomedical application of KGEMs is drug-disease prediction in the context of drug discovery, in which a KGEM is trained to predict triples linking drugs and diseases. These predictions can be later tested in clinical trials following extensive experimental validation. However, given the infeasibility of evaluating each of these predictions and that only a minimal number of candidates can be experimentally tested, models that yield higher precision on the top prioritized triples are preferred. In this paper, we apply the concept of ensemble learning on KGEMs for drug discovery to assess whether combining the predictions of several models can lead to an overall improvement in predictive performance. First, we trained and benchmarked 10 KGEMs to predict drug-disease triples on two independent biomedical KGs designed for drug discovery. Following, we applied different ensemble methods that aggregate the predictions of these models by leveraging the distribution or the position of the predicted triple scores. We then demonstrate how the ensemble models can achieve better results than the original KGEMs by benchmarking the precision (i.e., number of true positives prioritized) of their top predictions. Lastly, we released the source code presented in this work at https://github.com/enveda/kgem-ensembles-in-drug-discovery.

Keywords: drug discovery; ensemble learning; knowledge graph embedding models; knowledge graphs; link prediction; machine learning.

MeSH terms

  • Drug Discovery*
  • Knowledge
  • Machine Learning
  • Pattern Recognition, Automated*
  • Software