Quantifying deleterious effects of regulatory variants

Nucleic Acids Res. 2017 Mar 17;45(5):2307-2317. doi: 10.1093/nar/gkw1263.

Abstract

The majority of genome-wide association study (GWAS) risk variants reside in non-coding DNA sequences. Understanding how these sequence modifications lead to transcriptional alterations and cell-to-cell variability can help unraveling genotype-phenotype relationships. Here, we describe a computational method, dubbed CAPE, which calculates the likelihood of a genetic variant deactivating enhancers by disrupting the binding of transcription factors (TFs) in a given cellular context. CAPE learns sequence signatures associated with putative enhancers originating from large-scale sequencing experiments (such as ChIP-seq or DNase-seq) and models the change in enhancer signature upon a single nucleotide substitution. CAPE accurately identifies causative cis-regulatory variation including expression quantitative trait loci (eQTLs) and DNase I sensitivity quantitative trait loci (dsQTLs) in a tissue-specific manner with precision superior to several currently available methods. The presented method can be trained on any tissue-specific dataset of enhancers and known functional variants and applied to prioritize disease-associated variants in the corresponding tissue.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • B-Lymphocytes / cytology
  • B-Lymphocytes / metabolism
  • Base Sequence
  • Deoxyribonuclease I / metabolism
  • Enhancer Elements, Genetic*
  • Genetic Association Studies*
  • Genome, Human*
  • Genome-Wide Association Study
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Likelihood Functions
  • Machine Learning
  • Organ Specificity
  • Polymorphism, Single Nucleotide*
  • Protein Binding
  • Quantitative Trait Loci*
  • Transcription Factors / genetics
  • Transcription Factors / metabolism*
  • Transcription, Genetic

Substances

  • Transcription Factors
  • Deoxyribonuclease I