regSNPs-splicing: a tool for prioritizing synonymous single-nucleotide substitution

Hum Genet. 2017 Sep;136(9):1279-1289. doi: 10.1007/s00439-017-1783-x. Epub 2017 Apr 8.

Abstract

While synonymous single-nucleotide variants (sSNVs) have largely been unstudied, since they do not alter protein sequence, mounting evidence suggests that they may affect RNA conformation, splicing, and the stability of nascent-mRNAs to promote various diseases. Accurately prioritizing deleterious sSNVs from a pool of neutral ones can significantly improve our ability of selecting functional genetic variants identified from various genome-sequencing projects, and, therefore, advance our understanding of disease etiology. In this study, we develop a computational algorithm to prioritize sSNVs based on their impact on mRNA splicing and protein function. In addition to genomic features that potentially affect splicing regulation, our proposed algorithm also includes dozens structural features that characterize the functions of alternatively spliced exons on protein function. Our systematical evaluation on thousands of sSNVs suggests that several structural features, including intrinsic disorder protein scores, solvent accessible surface areas, protein secondary structures, and known and predicted protein family domains, show significant differences between disease-causing and neutral sSNVs. Our result suggests that the protein structure features offer an added dimension of information while distinguishing disease-causing and neutral synonymous variants. The inclusion of structural features increases the predictive accuracy for functional sSNV prioritization.

Keywords: Position Specific Score Matrix; Position Weight Matrix; Random Forest; Solvent Accessible Surface Area; Splice Site.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Female
  • Genetic Diseases, Inborn / genetics*
  • Genetic Diseases, Inborn / metabolism
  • Humans
  • Male
  • Models, Genetic*
  • Polymorphism, Single Nucleotide*
  • RNA Splicing / genetics*