Accurate SNP and mutation detection by targeted custom microarray-based genomic enrichment of short-fragment sequencing libraries

Nucleic Acids Res. 2010 Jun;38(10):e116. doi: 10.1093/nar/gkq072. Epub 2010 Feb 17.

Abstract

Microarray-based enrichment of selected genomic loci is a powerful method for genome complexity reduction for next-generation sequencing. Since the vast majority of exons in vertebrate genomes are smaller than 150 nt, we explored the use of short fragment libraries (85-110 bp) to achieve higher enrichment specificity by reducing carryover and adverse effects of flanking intronic sequences. High enrichment specificity (60-75%) was obtained with a relative even base coverage. Up to 98% of the target-sequence was covered more than 20x at an average coverage depth of about 200x. To verify the accuracy of SNP/mutation detection, we evaluated 384 known non-reference SNPs in the targeted regions. At approximately 200x average sequence coverage, we were able to survey 96.4% of 1.69 Mb of genomic sequence with only 4.2% false negative calls, mostly due to low coverage. Using the same settings, a total of 1197 novel candidate variants were detected. Verification experiments revealed only eight false positive calls, indicating an overall false positive rate of less than 1 per approximately 200,000 bp. Taken together, short fragment libraries provide highly efficient and flexible enrichment of exonic targets and yield relatively even base coverage, which facilitates accurate SNP and mutation detection. Raw sequencing data, alignment files and called SNPs have been submitted into GEO database http://www.ncbi.nlm.nih.gov/geo/ with accession number GSE18542.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • DNA Mutational Analysis / methods*
  • Gene Library
  • Genome, Human
  • Humans
  • Mutation
  • Oligonucleotide Array Sequence Analysis / methods*
  • Polymorphism, Single Nucleotide*
  • Sequence Analysis, DNA / methods*

Associated data

  • GEO/GSE18542