Learning graph representations of biochemical networks and its application to enzymatic link prediction

Bioinformatics. 2021 May 5;37(6):793-799. doi: 10.1093/bioinformatics/btaa881.

Abstract

Motivation: The complete characterization of enzymatic activities between molecules remains incomplete, hindering biological engineering and limiting biological discovery. We develop in this work a technique, enzymatic link prediction (ELP), for predicting the likelihood of an enzymatic transformation between two molecules. ELP models enzymatic reactions cataloged in the KEGG database as a graph. ELP is innovative over prior works in using graph embedding to learn molecular representations that capture not only molecular and enzymatic attributes but also graph connectivity.

Results: We explore transductive (test nodes included in the training graph) and inductive (test nodes not part of the training graph) learning models. We show that ELP achieves high AUC when learning node embeddings using both graph connectivity and node attributes. Further, we show that graph embedding improves link prediction by 30% in area under curve over fingerprint-based similarity approaches and by 8% over support vector machines. We compare ELP against rule-based methods. We also evaluate ELP for predicting links in pathway maps and for reconstruction of edges in reaction networks of four common gut microbiota phyla: actinobacteria, bacteroidetes, firmicutes and proteobacteria. To emphasize the importance of graph embedding in the context of biochemical networks, we illustrate how graph embedding can guide visualization.

Availability and implementation: The code and datasets are available through https://github.com/HassounLab/ELP.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Machine Learning*