Sequencing of high-complexity DNA pools for identification of nucleotide and structural variants in regions associated with complex traits

Eur J Hum Genet. 2012 Jan;20(1):77-83. doi: 10.1038/ejhg.2011.138. Epub 2011 Aug 3.

Abstract

We have used targeted genomic sequencing of high-complexity DNA pools based on long-range PCR and deep DNA sequencing by the SOLiD technology. The method was used for sequencing of 286 kb from four chromosomal regions with quantitative trait loci (QTL) influencing blood plasma lipid and uric acid levels in DNA pools of 500 individuals from each of five European populations. The method shows very good precision in estimating allele frequencies as compared with individual genotyping of SNPs (r(2) = 0.95, P < 10(-16)). Validation shows that the method is able to identify novel SNPs and estimate their frequency in high-complexity DNA pools. In our five populations, 17% of all SNPs and 61% of structural variants are not available in the public databases. A large fraction of the novel variants show a limited geographic distribution, with 62% of the novel SNPs and 59% of novel structural variants being detected in only one of the populations. The large number of population-specific novel SNPs underscores the need for comprehensive sequencing of local populations in order to identify the causal variants of human traits.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromosomes, Human / genetics
  • Cohort Studies
  • Computational Biology
  • Gene Frequency
  • Genetic Testing / methods
  • Genome, Human
  • Genomic Structural Variation*
  • Genotype
  • Glucose Transport Proteins, Facilitative / genetics
  • Humans
  • INDEL Mutation*
  • Lipase / genetics
  • Polymorphism, Single Nucleotide*
  • Quantitative Trait Loci*
  • Sensitivity and Specificity
  • Sequence Alignment
  • Sequence Analysis, DNA / methods*

Substances

  • Glucose Transport Proteins, Facilitative
  • LIPC protein, human
  • SLC2A9 protein, human
  • Lipase