A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets

Sunjin Moon; Joshua M Akey

doi:10.1101/gr.203059.115

A flexible method for estimating the fraction of fitness influencing mutations from large sequencing data sets

Genome Res. 2016 Jun;26(6):834-43. doi: 10.1101/gr.203059.115. Epub 2016 Apr 14.

Authors

Sunjin Moon¹, Joshua M Akey¹

Affiliation

¹ Department of Genome Sciences, University of Washington, Seattle, Washington 98195-5065, USA.

Abstract

A continuing challenge in the analysis of massively large sequencing data sets is quantifying and interpreting non-neutrally evolving mutations. Here, we describe a flexible and robust approach based on the site frequency spectrum to estimate the fraction of deleterious and adaptive variants from large-scale sequencing data sets. We applied our method to approximately 1 million single nucleotide variants (SNVs) identified in high-coverage exome sequences of 6515 individuals. We estimate that the fraction of deleterious nonsynonymous SNVs is higher than previously reported; quantify the effects of genomic context, codon bias, chromatin accessibility, and number of protein-protein interactions on deleterious protein-coding SNVs; and identify pathways and networks that have likely been influenced by positive selection. Furthermore, we show that the fraction of deleterious nonsynonymous SNVs is significantly higher for Mendelian versus complex disease loci and in exons harboring dominant versus recessive Mendelian mutations. In summary, as genome-scale sequencing data accumulate in progressively larger sample sizes, our method will enable increasingly high-resolution inferences into the characteristics and determinants of non-neutral variation.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Codon
Evolution, Molecular*
Genetic Fitness
Models, Genetic*
Mutation*
Open Reading Frames
Polymorphism, Single Nucleotide
Protein Interaction Maps / genetics
Selection, Genetic
Sequence Analysis, DNA
Statistics, Nonparametric

Substances

Codon

Abstract

Publication types

MeSH terms

Substances

Grants and funding