Data-driven network alignment

PLoS One. 2020 Jul 2;15(7):e0234978. doi: 10.1371/journal.pone.0234978. eCollection 2020.

Abstract

In this study, we deal with the problem of biological network alignment (NA), which aims to find a node mapping between species' molecular networks that uncovers similar network regions, thus allowing for the transfer of functional knowledge between the aligned nodes. We provide evidence that current NA methods, which assume that topologically similar nodes (i.e., nodes whose network neighborhoods are isomorphic-like) have high functional relatedness, do not actually end up aligning functionally related nodes. That is, we show that the current topological similarity assumption does not hold well. Consequently, we argue that a paradigm shift is needed with how the NA problem is approached. So, we redefine NA as a data-driven framework, called TARA (data-driven NA), which attempts to learn the relationship between topological relatedness and functional relatedness without assuming that topological relatedness corresponds to topological similarity. TARA makes no assumptions about what nodes should be aligned, distinguishing it from existing NA methods. Specifically, TARA trains a classifier to predict whether two nodes from different networks are functionally related based on their network topological patterns (features). We find that TARA is able to make accurate predictions. TARA then takes each pair of nodes that are predicted as related to be part of an alignment. Like traditional NA methods, TARA uses this alignment for the across-species transfer of functional knowledge. TARA as currently implemented uses topological but not protein sequence information for functional knowledge transfer. In this context, we find that TARA outperforms existing state-of-the-art NA methods that also use topological information, WAVE and SANA, and even outperforms or complements a state-of-the-art NA method that uses both topological and sequence information, PrimAlign. Hence, adding sequence information to TARA, which is our future work, is likely to further improve its performance. The software and data are available at http://www.nd.edu/~cone/TARA/.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology
  • Gene Ontology
  • Humans
  • Protein Interaction Mapping / methods*
  • Protein Interaction Maps*
  • Proteomics / methods
  • Software

Grants and funding

This work is supported by the Air Force Office of Scientific Research Young Investigator Research Program (https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503214) FA9550-16-1-0147, awarded to TM, and the National Science Foundation Faculty Early Career Development Program (https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503214) CCF-1452795, awarded to TM. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.