Granular clustering of de novo protein models

Bioinformatics. 2017 Feb 1;33(3):390-396. doi: 10.1093/bioinformatics/btw628.

Abstract

Motivation: Modern algorithms for de novo prediction of protein structures typically output multiple full-length models (decoys) rather than a single solution. Subsequent clustering of such decoys is used both to gauge the success of the modelling and to decide on the most native-like conformation. At the same time, partial protein models are sufficient for some applications such as crystallographic phasing by molecular replacement (MR) in particular, provided these models represent a certain part of the target structure with reasonable accuracy.

Results: Here we propose a novel clustering algorithm that natively operates in the space of partial models through an approach known as granular clustering (GC). The algorithm is based on growing local similarities found in a pool of initial decoys. We demonstrate that the resulting clusters of partial models provide a substantially more accurate structural detail on the target protein than those obtained upon a global alignment of decoys. As the result, the partial models output by our GC algorithm are also much more effective towards the MR procedure, compared to the models produced by existing software.

Availability and implementation: The source code is freely available at https://github.com/biocryst/gc

Contact: sergei.strelkov@kuleuven.be

Suplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Computational Biology / methods*
  • Models, Molecular*
  • Protein Conformation*
  • Sequence Analysis, Protein / methods*
  • Software*