Where differences resemble: sequence-feature analysis in curated databases of intrinsically disordered proteins

Marco Necci; Damiano Piovesan; Silvio C E Tosatto

doi:10.1093/database/bay127

Where differences resemble: sequence-feature analysis in curated databases of intrinsically disordered proteins

Database (Oxford). 2018 Jan 1:2018:bay127. doi: 10.1093/database/bay127.

Authors

Marco Necci^{1

2

3}, Damiano Piovesan¹, Silvio C E Tosatto^{1

4}

Affiliations

¹ Department of Biomedical Sciences, University of Padua, Via Ugo Bassi 58b, Padua, Italy.
² Department of Agricultural Sciences, University of Udine, Via Palladio 8, Udine, Italy.
³ Fondazione Edmund Mach, Via Edmund Mach 1, San Michele all'Adige, Italy.
⁴ Institute of Neuroscience, National Research Council, Corso Stati Uniti 4, Padua, Italy.

Abstract

Intrinsic disorder (ID) in proteins is involved in crucial interactions in the living cell. As the importance of ID is increasingly recognized, so are detailed analyses aimed at its identification and characterization. An open question remains the existence of ID `flavors' representing different sub-phenomena. Several databases collect manually curated examples of experimentally validated ID, focusing on apparently different aspects of this phenomenon. The recent update of MobiDB presented the opportunity to carry out an in-depth comparison of the content of these validated ID collections, namely DIBS, DisProt, IDEAL, MFIB, FuzDB, ELM and UniProt. In order to assess what is specific to different ID flavors, we analyzed relevant sequence-based features, such as amino acid composition, length, taxa and gene ontology terms, highlighting differences and similarities among datasets. Despite that, the majority of the considered features are not statistically different across databases, with the exception of ELM. FuzDB also shares half of its entries with DisProt. In general, different ID databases describe similar phenomena. DisProt, which is the largest database, better represents the entire spectrum of different disorder flavors and the corresponding sequence diversity.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computational Biology / methods*
Databases, Protein*
Intrinsically Disordered Proteins* / chemistry
Intrinsically Disordered Proteins* / genetics
Protein Conformation
Sequence Analysis, Protein / methods*

Substances

Intrinsically Disordered Proteins