Efficient identification of rare variants in large populations: deep re-sequencing the CRP locus in the CARDIA study

Nucleic Acids Res. 2013 Apr;41(7):e85. doi: 10.1093/nar/gkt092. Epub 2013 Feb 13.

Abstract

Effect sizes of many common single nucleotide polymorphisms identified in genome-wide association studies generally explain only a modest fraction of the total estimated heritability in a variety of traits. One hypothesis is that rare variants with larger effects might account for the missing heritability. Despite advances in sequencing technology, discovering rare variants in a large population is still economically challenging. Sequencing pooled samples can reduce the cost, but detecting rare variants and identifying individual carriers is difficult and requires additional experiments. To address these issues, we have developed a rare variant-detection algorithm V-Sieve to screen for rare alleles in pooled DNA samples which, in combination with a unique pooling strategy, is able to efficiently screen a candidate gene for idiosyncratic variants in thousands of samples. We applied this method to 2283 individuals, and identified >100 polymorphisms in the C-reactive protein locus at an allele frequency as low as 0.02%, with a positive predictive rate of 93%. We believe this algorithm will be useful in both screening for rare variants in genomic regions known to associate with particular phenotypes and in replicating rare variant associations identified in large-scale studies, such as exome re-sequencing projects.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Algorithms*
  • C-Reactive Protein / genetics*
  • Gene Frequency
  • Genome, Human
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Polymorphism, Single Nucleotide*
  • Sequence Analysis, DNA / methods*

Substances

  • C-Reactive Protein