Variant priorization and analysis incorporating problematic regions of the genome

Pac Symp Biocomput. 2014:277-87.

Abstract

In case-control studies of rare Mendelian disorders and complex diseases, the power to detect variant and gene-level associations of a given effect size is limited by the size of the study sample. Paradoxically, low statistical power may increase the likelihood that a statistically significant finding is also a false positive. The prioritization of variants based on call quality, putative effects on protein function, the predicted degree of deleteriousness, and allele frequency is often used as a mechanism for reducing the occurrence of false positives, while preserving the set of variants most likely to contain true disease associations. We propose that specificity can be further improved by considering errors that are specific to the regions of the genome being sequenced. These problematic regions (PRs) are identified a-priori and are used to down-weight constitutive variants in a case-control analysis. Using samples drawn from 1000-Genomes, we illustrate the utility of PRs in identifying true variant and gene associations using a case-control study on a known Mendelian disease, cystic fibrosis (CF).

MeSH terms

  • Case-Control Studies
  • Computational Biology
  • Cystic Fibrosis / genetics
  • Cystic Fibrosis Transmembrane Conductance Regulator / genetics
  • Databases, Genetic / statistics & numerical data
  • Exome
  • Genetic Association Studies / statistics & numerical data
  • Genetic Variation*
  • Genome, Human*
  • Genomic Library
  • Human Genome Project
  • Humans
  • Precision Medicine / statistics & numerical data
  • Sample Size
  • Sequence Alignment / statistics & numerical data

Substances

  • CFTR protein, human
  • Cystic Fibrosis Transmembrane Conductance Regulator