Using secondary structure to predict the effects of genetic variants on alternative splicing

Hum Mutat. 2019 Sep;40(9):1270-1279. doi: 10.1002/humu.23790. Epub 2019 Jun 18.

Abstract

Accurate interpretation of genomic variants that alter RNA splicing is critical to precision medicine. We present a computational framework, Prediction of variant Effect on Percent Spliced In (PEPSI), that predicts the splicing impact of coding and noncoding variants for the Fifth Critical Assessment of Genome Interpretation (CAGI5) "Vex-seq" challenge. PEPSI is a random forest regression model trained on multiple layers of features associated with sequence conservation and regulatory sequence elements. Compared to other splicing defect prediction tools from the literature, our framework integrates secondary structure information in predicting variants that disrupt splicing regulatory elements (SREs). We applied our model to classify splice-disrupting variants among 2,094 single-nucleotide polymorphisms from the Exome Aggregation Consortium using model-predicted changes in percent spliced in (ΔPSI) associated with tested variants. Benchmarking our model against widely used state-of-the-art tools, we demonstrate that PEPSI achieves comparable performance in terms of sensitivity and precision. Moreover, we also show that using secondary structure context can help resolve several cases where changes in the counts of SREs do not correspond with the directionality of ΔPSI measured for tested variants.

Keywords: CAGI; RNA secondary structure; alternative splicing; splice-disrupting variants; splicing regulatory elements.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Alternative Splicing*
  • Animals
  • Computational Biology
  • Exome Sequencing
  • Humans
  • Polymorphism, Single Nucleotide*
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Proteins / genetics*
  • RNA Splice Sites
  • Regression Analysis

Substances

  • Proteins
  • RNA Splice Sites