Prediction of interactions between cell surface proteins by machine learning

Proteins. 2024 Apr;92(4):567-580. doi: 10.1002/prot.26648. Epub 2023 Dec 5.

Abstract

Cells detect changes in their external environments or communicate with each other through proteins on their surfaces. These cell surface proteins form a complicated network of interactions in order to fulfill their functions. The interactions between cell surface proteins are highly dynamic and, thus, challenging to detect using traditional experimental techniques. Here, we tackle this challenge using a computational framework. The primary focus of the framework is to develop new tools to identify interactions between domains in the immunoglobulin (Ig) fold, which is the most abundant domain family in cell surface proteins. These interactions could be formed between ligands and receptors from different cells or between proteins on the same cell surface. In practice, we collected all structural data on Ig domain interactions and transformed them into an interface fragment pair library. A high-dimensional profile can then be constructed from the library for a given pair of query protein sequences. Multiple machine learning models were used to read this profile so that the probability of interaction between the query proteins could be predicted. We tested our models on an experimentally derived dataset that contains 564 cell surface proteins in humans. The cross-validation results show that we can achieve higher than 70% accuracy in identifying the PPIs within this dataset. We then applied this method to a group of 46 cell surface proteins in Caenorhabditis elegans. We screened every possible interaction between these proteins. Many interactions recognized by our machine learning classifiers have been experimentally confirmed in the literature. In conclusion, our computational platform serves as a useful tool to help identify potential new interactions between cell surface proteins in addition to current state-of-the-art experimental techniques. The tool is freely accessible for use by the scientific community. Moreover, the general framework of the machine learning classification can also be extended to study the interactions of proteins in other domain superfamilies.

Keywords: cell surface protein; immunoglobulin domain; immunoglobulin fold; machine learning; protein-protein interactions.

MeSH terms

  • Amino Acid Sequence
  • Humans
  • Immunoglobulins
  • Ligands
  • Machine Learning*
  • Membrane Proteins*

Substances

  • Membrane Proteins
  • Immunoglobulins
  • Ligands