Catching hidden variation: systematic correction of reference minor allele annotation in clinical variant calling

Genet Med. 2018 Mar;20(3):360-364. doi: 10.1038/gim.2017.168. Epub 2017 Oct 26.

Abstract

PurposeWe comprehensively assessed the influence of reference minor alleles (RMAs), one of the inherent problems of the human reference genome sequence.MethodsThe variant call format (VCF) files provided by the 1000 Genomes and Exome Aggregation Consortium (ExAC) consortia were used to identify RMA sites. All coding RMA sites were checked for concordance with UniProt and the presence of same codon variants. RMA-corrected predictions of functional effect were obtained with SIFT, PolyPhen-2, and PROVEAN standalone tools and compared with dbNSFP v2.9 for consistency.ResultsWe systematically characterized the problem of RMAs and identified several possible ways in which RMA could interfere with accurate variant discovery and annotation. We have discovered a systematic bias in the automated variant effect prediction at the RMA loci, as well as widespread switching of functional consequences for variants located in the same codon as the RMA. As a convenient way to address the problem of RMAs we have developed a simple bioinformatic tool that identifies variation at RMA sites and provides correct annotations for all such substitutions. The tool is available free of charge at http://rmahunter.bioinf.me.ConclusionCorrection of RMA annotation enhances the accuracy of next-generation sequencing-based methods in clinical practice.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles*
  • Amino Acid Sequence
  • Amino Acid Substitution
  • Computational Biology / methods
  • Computational Biology / standards
  • Genetic Variation*
  • Genomics / methods
  • Genomics / standards
  • Humans
  • Molecular Sequence Annotation / standards*
  • Polymorphism, Single Nucleotide
  • Reproducibility of Results