Reducing Ensembles of Protein Tertiary Structures Generated De Novo via Clustering

Molecules. 2020 May 9;25(9):2228. doi: 10.3390/molecules25092228.

Abstract

Controlling the quality of tertiary structures computed for a protein molecule remains a central challenge in de-novo protein structure prediction. The rule of thumb is to generate as many structures as can be afforded, effectively acknowledging that having more structures increases the likelihood that some will reside near the sought biologically-active structure. A major drawback with this approach is that computing a large number of structures imposes time and space costs. In this paper, we propose a novel clustering-based approach which we demonstrate to significantly reduce an ensemble of generated structures without sacrificing quality. Evaluations are related on both benchmark and CASP target proteins. Structure ensembles subjected to the proposed approach and the source code of the proposed approach are publicly-available at the links provided in Section 1.

Keywords: clustering; decoy ensemble; protein structure prediction; reduction; tertiary structure.

MeSH terms

  • Cluster Analysis
  • Computational Biology / methods*
  • Models, Molecular
  • Protein Folding
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Software

Substances

  • Proteins