Comprehensive large-scale assessment of intrinsic protein disorder

Bioinformatics. 2015 Jan 15;31(2):201-8. doi: 10.1093/bioinformatics/btu625. Epub 2014 Sep 21.

Abstract

Motivation: Intrinsically disordered regions are key for the function of numerous proteins. Due to the difficulties in experimental disorder characterization, many computational predictors have been developed with various disorder flavors. Their performance is generally measured on small sets mainly from experimentally solved structures, e.g. Protein Data Bank (PDB) chains. MobiDB has only recently started to collect disorder annotations from multiple experimental structures.

Results: MobiDB annotates disorder for UniProt sequences, allowing us to conduct the first large-scale assessment of fast disorder predictors on 25 833 different sequences with X-ray crystallographic structures. In addition to a comprehensive ranking of predictors, this analysis produced the following interesting observations. (i) The predictors cluster according to their disorder definition, with a consensus giving more confidence. (ii) Previous assessments appear over-reliant on data annotated at the PDB chain level and performance is lower on entire UniProt sequences. (iii) Long disordered regions are harder to predict. (iv) Depending on the structural and functional types of the proteins, differences in prediction performance of up to 10% are observed.

Availability: The datasets are available from Web site at URL: http://mobidb.bio.unipd.it/lsd.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Crystallography, X-Ray
  • Databases, Protein
  • Humans
  • Molecular Sequence Annotation
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Sequence Analysis, Protein / methods*
  • Tumor Suppressor Protein p53 / chemistry*

Substances

  • Proteins
  • TP53 protein, human
  • Tumor Suppressor Protein p53