The effects of demography and long-term selection on the accuracy of genomic prediction with sequence data

Genetics. 2014 Dec;198(4):1671-84. doi: 10.1534/genetics.114.168344. Epub 2014 Sep 18.

Abstract

The use of dense SNPs to predict the genetic value of an individual for a complex trait is often referred to as "genomic selection" in livestock and crops, but is also relevant to human genetics to predict, for example, complex genetic disease risk. The accuracy of prediction depends on the strength of linkage disequilibrium (LD) between SNPs and causal mutations. If sequence data were used instead of dense SNPs, accuracy should increase because causal mutations are present, but demographic history and long-term negative selection also influence accuracy. We therefore evaluated genomic prediction, using simulated sequence in two contrasting populations: one reducing from an ancestrally large effective population size (Ne) to a small one, with high LD common in domestic livestock, while the second had a large constant-sized Ne with low LD similar to that in some human or outbred plant populations. There were two scenarios in each population; causal variants were either neutral or under long-term negative selection. For large Ne, sequence data led to a 22% increase in accuracy relative to ∼600K SNP chip data with a Bayesian analysis and a more modest advantage with a BLUP analysis. This advantage increased when causal variants were influenced by negative selection, and accuracy persisted when 10 generations separated reference and validation populations. However, in the reducing Ne population, there was little advantage for sequence even with negative selection. This study demonstrates the joint influence of demography and selection on accuracy of prediction and improves our understanding of how best to exploit sequence for genomic prediction.

Keywords: GenPred; genomic selection; high-density SNP; shared data resource; whole-genome sequence.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alleles
  • Animals
  • Computer Simulation
  • Datasets as Topic
  • Gene Frequency
  • Genetic Predisposition to Disease*
  • Genetic Variation
  • Genetics, Population
  • Genomics / methods*
  • Genotype
  • Humans
  • Linkage Disequilibrium
  • Models, Genetic*
  • Models, Statistical
  • Mutation
  • Phenotype
  • Polymorphism, Single Nucleotide
  • Quantitative Trait Loci
  • Quantitative Trait, Heritable
  • Reproducibility of Results
  • Selection, Genetic