Family-specific analysis of variant pathogenicity prediction tools

Jan Zaucha; Michael Heinzinger; Svetlana Tarnovskaya; Burkhard Rost; Dmitrij Frishman

doi:10.1093/nargab/lqaa014

Family-specific analysis of variant pathogenicity prediction tools

NAR Genom Bioinform. 2020 Feb 28;2(2):lqaa014. doi: 10.1093/nargab/lqaa014. eCollection 2020 Jun.

Authors

Jan Zaucha¹, Michael Heinzinger², Svetlana Tarnovskaya³, Burkhard Rost², Dmitrij Frishman¹

Affiliations

¹ Department of Bioinformatics, Technical University of Munich, 85354 Freising, Germany.
² Department of Informatics, Bioinformatics & Computational Biology-i12, Technical University of Munich, 85748 Garching, Germany.
³ Almazov National Medical Research Centre, St. Petersburg 197341, Russia.

Abstract

Using the presently available datasets of annotated missense variants, we ran a protein family-specific benchmarking of tools for predicting the pathogenicity of single amino acid variants. We find that despite the high overall accuracy of all tested methods, each tool has its Achilles heel, i.e. protein families in which its predictions prove unreliable (expected accuracy does not exceed 51% in any method). As a proof of principle, we show that choosing the optimal tool and pathogenicity threshold at a protein family-individual level allows obtaining reliable predictions in all Pfam domains (accuracy no less than 68%). A functional analysis of the sets of protein domains annotated exclusively by neutral or pathogenic mutations indicates that specific protein functions can be associated with a high or low sensitivity to mutations, respectively. The highly sensitive sets of protein domains are involved in the regulation of transcription and DNA sequence-specific transcription factor binding, while the domains that do not result in disease when mutated are responsible for mediating immune and stress responses. These results suggest that future predictors of pathogenicity and especially variant prioritization tools may benefit from considering functional annotation.