A simple method for analyzing exome sequencing data shows distinct levels of nonsynonymous variation for human immune and nervous system genes

PLoS One. 2012;7(6):e38087. doi: 10.1371/journal.pone.0038087. Epub 2012 Jun 6.

Abstract

To measure the strength of natural selection that acts upon single nucleotide variants (SNVs) in a set of human genes, we calculate the ratio between nonsynonymous SNVs (nsSNVs) per nonsynonymous site and synonymous SNVs (sSNVs) per synonymous site. We transform this ratio with a respective factor f that corrects for the bias of synonymous sites towards transitions in the genetic code and different mutation rates for transitions and transversions. This method approximates the relative density of nsSNVs (rdnsv) in comparison with the neutral expectation as inferred from the density of sSNVs. Using SNVs from a diploid genome and 200 exomes, we apply our method to immune system genes (ISGs), nervous system genes (NSGs), randomly sampled genes (RSGs), and gene ontology annotated genes. The estimate of rdnsv in an individual exome is around 20% for NSGs and 30-40% for ISGs and RSGs. This smaller rdnsv of NSGs indicates overall stronger purifying selection. To quantify the relative shift of nsSNVs towards rare variants, we next fit a linear regression model to the estimates of rdnsv over different SNV allele frequency bins. The obtained regression models show a negative slope for NSGs, ISGs and RSGs, supporting an influence of purifying selection on the frequency spectrum of segregating nsSNVs. The y-intercept of the model predicts rdnsv for an allele frequency close to 0. This parameter can be interpreted as the proportion of nonsynonymous sites where mutations are tolerated to segregate with an allele frequency notably greater than 0 in the population, given the performed normalization of the observed nsSNV to sSNV ratio. A smaller y-intercept is displayed by NSGs, indicating more nonsynonymous sites under strong negative selection. This predicts more monogenically inherited or de-novo mutation diseases that affect the nervous system.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Codon / genetics
  • Exome / genetics*
  • Gene Expression / immunology*
  • Gene Frequency
  • Genes / genetics*
  • Genetic Association Studies / methods
  • Genetic Variation*
  • Humans
  • Linear Models
  • Models, Genetic
  • Mutation Rate
  • Nerve Tissue Proteins / genetics*
  • Nerve Tissue Proteins / metabolism
  • Polymorphism, Single Nucleotide / genetics*
  • Selection, Genetic*

Substances

  • Codon
  • Nerve Tissue Proteins