Integrating Multimeric Threading With High-throughput Experiments for Structural Interactome of Escherichia coli

J Mol Biol. 2021 May 14;433(10):166944. doi: 10.1016/j.jmb.2021.166944. Epub 2021 Mar 16.

Abstract

Genome-wide protein-protein interaction (PPI) determination remains a significant unsolved problem in structural biology. The difficulty is twofold since high-throughput experiments (HTEs) have often a relatively high false-positive rate in assigning PPIs, and PPI quaternary structures are more difficult to solve than tertiary structures using traditional structural biology techniques. We proposed a uniform pipeline, Threpp, to address both problems. Starting from a pair of monomer sequences, Threpp first threads both sequences through a complex structure library, where the alignment score is combined with HTE data using a naïve Bayesian classifier model to predict the likelihood of two chains to interact with each other. Next, quaternary complex structures of the identified PPIs are constructed by reassembling monomeric alignments with dimeric threading frameworks through interface-specific structural alignments. The pipeline was applied to the Escherichia coli genome and created 35,125 confident PPIs which is 4.5-fold higher than HTE alone. Graphic analyses of the PPI networks show a scale-free cluster size distribution, consistent with previous studies, which was found critical to the robustness of genome evolution and the centrality of functionally important proteins that are essential to E. coli survival. Furthermore, complex structure models were constructed for all predicted E. coli PPIs based on the quaternary threading alignments, where 6771 of them were found to have a high confidence score that corresponds to the correct fold of the complexes with a TM-score >0.5, and 39 showed a close consistency with the later released experimental structures with an average TM-score = 0.73. These results demonstrated the significant usefulness of threading-based homologous modeling in both genome-wide PPI network detection and complex structural construction.

Keywords: Escherichia coli genome; multiple-chain threading; network centrality; protein-protein interaction networks; structural interactome.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bayes Theorem
  • Cluster Analysis
  • Escherichia coli / genetics*
  • Escherichia coli / metabolism
  • Escherichia coli Proteins / chemistry
  • Escherichia coli Proteins / genetics*
  • Escherichia coli Proteins / metabolism
  • Gene Expression Regulation, Bacterial
  • Genome, Bacterial
  • HSP70 Heat-Shock Proteins / chemistry
  • HSP70 Heat-Shock Proteins / genetics*
  • HSP70 Heat-Shock Proteins / metabolism
  • Phosphotransferases / chemistry
  • Phosphotransferases / genetics*
  • Phosphotransferases / metabolism
  • Protein Folding
  • Protein Interaction Mapping
  • Protein Interaction Maps / genetics
  • Protein Structure, Quaternary
  • Proteome / chemistry
  • Proteome / genetics*
  • Proteome / metabolism
  • Signal Transduction
  • Transcription Factors / chemistry
  • Transcription Factors / genetics*
  • Transcription Factors / metabolism

Substances

  • Escherichia coli Proteins
  • FtsA protein, E coli
  • HSP70 Heat-Shock Proteins
  • Proteome
  • RcsB protein, E coli
  • Transcription Factors
  • RcsA protein, E coli
  • Phosphotransferases
  • rcsD protein, E coli
  • dnaK protein, E coli