The Complementarity Between Protein-Specific and General Pathogenicity Predictors for Amino Acid Substitutions

Casandra Riera; Natàlia Padilla; Xavier de la Cruz

doi:10.1002/humu.23048

The Complementarity Between Protein-Specific and General Pathogenicity Predictors for Amino Acid Substitutions

Hum Mutat. 2016 Oct;37(10):1013-24. doi: 10.1002/humu.23048. Epub 2016 Aug 8.

Authors

Casandra Riera¹, Natàlia Padilla¹, Xavier de la Cruz^{2

3}

Affiliations

¹ Research Unit in Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain.
² Research Unit in Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain. xavier.delacruz@vhir.org.
³ ICREA, Barcelona, Spain. xavier.delacruz@vhir.org.

PMID: 27397615
DOI: 10.1002/humu.23048

Abstract

The usage of next-generation sequencing with biomedical/clinical purposes has fuelled the demand for tools that assess the functional impact of sequence variants. For single amino acid variants, general methods (GM), based on biophysics/evolutionary principles and trained by pooling variants from many proteins, are already available. Until now, their accuracy range (∼80%) has limited their usage in clinical applications. In parallel, a series of studies indicate that protein-specific predictors (PSP), using only information from the protein of interest, could frequently surpass the performance of GM. However, two reasons suggest that this may not always be the case: the existence of a performance threshold affecting both GM and PSP, and the effect of training data scarcity. Here, we characterize the relationship between the two approaches deriving 82 PSP and comparing them with several GM (PolyPhen-2, SIFT, PON-P2, MutationTaster2, CADD). We find a complementary relationship between PSP and GM, with no approach always outperforming the other. However, the relationship varies between two limiting situations, for example, PSP are frequently outperformed by PON-P2, the best GM; however, the opposite happens when we compare PSP and SIFT. Finally, we explore how the observed complementarity could lead to increased success rates in pathogenicity prediction.

Keywords: amino acid variants; in silico pathogenicity predictions; missense variants; molecular diagnostics; next-generation sequencing.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Amino Acid Substitution*
Computational Biology / methods*
Humans
Proteins / genetics*
Software

Substances

Proteins