Evaluating the use of paralogous protein domains to increase data availability for missense variant classification

Genome Med. 2023 Dec 12;15(1):110. doi: 10.1186/s13073-023-01264-6.

Abstract

Background: Classification of rare missense variants remains an ongoing challenge in genomic medicine. Evidence of pathogenicity is often sparse, and decisions about how to weigh different evidence classes may be subjective. We used a Bayesian variant classification framework to investigate the performance of variant co-localisation, missense constraint, and aggregating data across paralogous protein domains ("meta-domains").

Methods: We constructed a database of all possible coding single nucleotide variants in the human genome and used PFam predictions to annotate structurally-equivalent positions across protein domains. We counted the number of pathogenic and benign missense variants at these equivalent positions in the ClinVar database, calculated a regional constraint score for each meta-domain, and assessed this approach versus existing missense constraint metrics for classifying variant pathogenicity and benignity.

Results: Alternative pathogenic missense variants at the same amino acid position in the same protein provide strong evidence of pathogenicity (positive likelihood ratio, LR+ = 85). Additionally, clinically annotated pathogenic or benign missense variants at equivalent positions in different proteins can provide moderate evidence of pathogenicity (LR+ = 7) or benignity (LR+ = 5), respectively. Applying these approaches sequentially (through PM5) increases sensitivity for classifying pathogenic missense variants from 27 to 41%. Missense constraint can also provide strong evidence of pathogenicity for some variants, but its absence provides no evidence of benignity.

Conclusions: We propose using structurally equivalent positions across related protein domains from different genes to augment evidence for variant co-localisation when classifying novel missense variants. Additionally, we advocate adopting a numerical evidence-based approach to integrating diverse data in variant interpretation.

Keywords: Bayesian; Genomic medicine; Missense variant; Protein domain; Variant classification.

MeSH terms

  • Bayes Theorem
  • Computational Biology*
  • Humans
  • Mutation, Missense
  • Protein Domains
  • Proteins*

Substances

  • Proteins