Systematic Computational Identification of Variants That Activate Exonic and Intronic Cryptic Splice Sites

Am J Hum Genet. 2017 May 4;100(5):751-765. doi: 10.1016/j.ajhg.2017.04.001.

Abstract

We developed a variant-annotation method that combines sequence-based machine-learning classification with a context-dependent algorithm for selecting splice variants. Our approach is distinctive in that it compares the splice potential of a sequence bearing a variant with the splice potential of the reference sequence. After training, classification accurately identified 168 of 180 (93.3%) canonical splice sites of five genes. The combined method, CryptSplice, identified and correctly predicted the effect of 18 of 21 (86%) known splice-altering variants in CFTR, a well-studied gene whose loss-of-function variants cause cystic fibrosis (CF). Among 1,423 unannotated CFTR disease-associated variants, the method identified 32 potential exonic cryptic splice variants, two of which were experimentally evaluated and confirmed. After complete CFTR sequencing, the method found three cryptic intronic splice variants (one known and two experimentally verified) that completed the molecular diagnosis of CF in 6 of 14 individuals. CryptSplice interrogation of sequence data from six individuals with X-linked dyskeratosis congenita caused by an unknown disease-causing variant in DKC1 identified two splice-altering variants that were experimentally verified. To assess the extent to which disease-associated variants might activate cryptic splicing, we selected 458 pathogenic variants and 348 variants of uncertain significance (VUSs) classified as high confidence from ClinVar. Splice-site activation was predicted for 129 (28%) of the pathogenic variants and 75 (22%) of the VUSs. Our findings suggest that cryptic splice-site activation is more common than previously thought and should be routinely considered for all variants within the transcribed regions of genes.

Keywords: cryptic splicing; cystic fibrosis; machine learning; minigene; pseudoexon; splice acceptor; splice donor; splice variant; splicing.

MeSH terms

  • Algorithms
  • Cell Cycle Proteins / genetics*
  • Cell Cycle Proteins / metabolism
  • Computational Biology*
  • Cystic Fibrosis / genetics
  • Cystic Fibrosis Transmembrane Conductance Regulator / genetics*
  • Cystic Fibrosis Transmembrane Conductance Regulator / metabolism
  • Dyskeratosis Congenita / genetics
  • Exons
  • Gene Expression Regulation
  • Genetic Loci
  • Genetic Variation*
  • Genomics
  • HEK293 Cells
  • Humans
  • Introns
  • Mutation, Missense
  • Nuclear Proteins / genetics*
  • Nuclear Proteins / metabolism
  • RNA Splice Sites*
  • RNA Splicing
  • Sequence Analysis, DNA
  • Support Vector Machine

Substances

  • CFTR protein, human
  • Cell Cycle Proteins
  • DKC1 protein, human
  • Nuclear Proteins
  • RNA Splice Sites
  • Cystic Fibrosis Transmembrane Conductance Regulator