The statistics of bulk segregant analysis using next generation sequencing

PLoS Comput Biol. 2011 Nov;7(11):e1002255. doi: 10.1371/journal.pcbi.1002255. Epub 2011 Nov 3.

Abstract

We describe a statistical framework for QTL mapping using bulk segregant analysis (BSA) based on high throughput, short-read sequencing. Our proposed approach is based on a smoothed version of the standard G statistic, and takes into account variation in allele frequency estimates due to sampling of segregants to form bulks as well as variation introduced during the sequencing of bulks. Using simulation, we explore the impact of key experimental variables such as bulk size and sequencing coverage on the ability to detect QTLs. Counterintuitively, we find that relatively large bulks maximize the power to detect QTLs even though this implies weaker selection and less extreme allele frequency differences. Our simulation studies suggest that with large bulks and sufficient sequencing depth, the methods we propose can be used to detect even weak effect QTLs and we demonstrate the utility of this framework by application to a BSA experiment in the budding yeast Saccharomyces cerevisiae.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Chromosome Mapping / statistics & numerical data*
  • Computational Biology
  • DNA, Fungal / genetics
  • Gene Frequency
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Models, Genetic
  • Models, Statistical
  • Quantitative Trait Loci*
  • Saccharomyces cerevisiae / genetics
  • Statistics, Nonparametric

Substances

  • DNA, Fungal