Multi-criteria protein structure comparison and structural similarities analysis using pyMCPSC

PLoS One. 2018 Oct 17;13(10):e0204587. doi: 10.1371/journal.pone.0204587. eCollection 2018.

Abstract

Protein Structure Comparison (PSC) is a well developed field of computational proteomics with active interest from the research community, since it is widely used in structural biology and drug discovery. With new PSC methods continuously emerging and no clear method of choice, Multi-Criteria Protein Structure Comparison (MCPSC) is commonly employed to combine methods and generate consensus structural similarity scores. We present pyMCPSC, a Python based utility we developed to allow users to perform MCPSC efficiently, by exploiting the parallelism afforded by the multi-core CPUs of today's desktop computers. We show how pyMCPSC facilitates the analysis of similarities in protein domain datasets and how it can be extended to incorporate new PSC methods as they are becoming available. We exemplify the power of pyMCPSC using a case study based on the Proteus_300 dataset. Results generated using pyMCPSC show that MCPSC scores form a reliable basis for identifying the true classification of a domain, as evidenced both by the ROC analysis as well as the Nearest-Neighbor analysis. Structure similarity based "Phylogenetic Trees" representation generated by pyMCPSC provide insight into functional grouping within the dataset of domains. Furthermore, scatter plots generated by pyMCPSC show the existence of strong correlation between protein domains belonging to SCOP Class C and loose correlation between those of SCOP Class D. Such analyses and corresponding visualizations help users quickly gain insights about their datasets. The source code of pyMCPSC is available under the GPLv3.0 license through a GitHub repository (https://github.com/xulesc/pymcpsc).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Visualization
  • Databases, Protein
  • Phylogeny
  • Protein Conformation*
  • Protein Domains
  • Proteomics / methods*
  • ROC Curve
  • Software*

Grants and funding

This research was supported by the European Union (European Social Fund ESF) and Greek national funds, through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program "Heracleitus II."