State-of-the-art web services for de novo protein structure prediction

Brief Bioinform. 2021 May 20;22(3):bbaa139. doi: 10.1093/bib/bbaa139.

Abstract

Residue coevolution estimations coupled to machine learning methods are revolutionizing the ability of protein structure prediction approaches to model proteins that lack clear homologous templates in the Protein Data Bank (PDB). This has been patent in the last round of the Critical Assessment of Structure Prediction (CASP), which presented several very good models for the hardest targets. Unfortunately, literature reporting on these advances often lacks digests tailored to lay end users; moreover, some of the top-ranking predictors do not provide webservers that can be used by nonexperts. How can then end users benefit from these advances and correctly interpret the predicted models? Here we review the web resources that biologists can use today to take advantage of these state-of-the-art methods in their research, including not only the best de novo modeling servers but also datasets of models precomputed by experts for structurally uncharacterized protein families. We highlight their features, advantages and pitfalls for predicting structures of proteins without clear templates. We present a broad number of applications that span from driving forward biochemical investigations that lack experimental structures to actually assisting experimental structure determination in X-ray diffraction, cryo-EM and other forms of integrative modeling. We also discuss issues that must be considered by users yet still require further developments, such as global and residue-wise model quality estimates and sources of residue coevolution other than monomeric tertiary structure.

Keywords: machine learning; molecular modeling; alphafold; casp; coevolution; structure prediction.

Publication types

  • Review

MeSH terms

  • Animals
  • Computational Biology*
  • Databases, Protein*
  • Humans
  • Machine Learning*
  • Models, Molecular*
  • Protein Conformation
  • Protein Folding*
  • Proteins
  • Sequence Analysis, Protein*
  • Software*

Substances

  • Proteins