Reliability of genomic predictions across multiple populations

Genetics. 2009 Dec;183(4):1545-53. doi: 10.1534/genetics.109.104935. Epub 2009 Oct 12.

Abstract

Genomic prediction of future phenotypes or genetic merit using dense SNP genotypes can be used for prediction of disease risk, forensics, and genomic selection of livestock and domesticated plant species. The reliability of genomic predictions is their squared correlation with the true genetic merit and indicates the proportion of the genetic variance that is explained. As reliability relies heavily on the number of phenotypes, combining data sets from multiple populations may be attractive as a way to increase reliabilities, particularly when phenotypes are scarce. However, this strategy may also decrease reliabilities if the marker effects are very different between the populations. The effect of combining multiple populations on the reliability of genomic predictions was assessed for two simulated cattle populations, A and B, that had diverged for T = 6, 30, or 300 generations. The training set comprised phenotypes of 1000 individuals from population A and 0, 300, 600, or 1000 individuals from population B, while marker density and trait heritability were varied. Adding individuals from population B to the training set increased the reliability in population A by up to 0.12 when the marker density was high and T = 6, whereas it decreased the reliability in population A by up to 0.07 when the marker density was low and T = 300. Without individuals from population B in the training set, the reliability in population B was up to 0.77 lower than in population A, especially for large T. Adding individuals from population B to the training set increased the reliability in population B to close to the same level as in population A when the marker density was sufficiently high for the marker-QTL linkage disequilibrium to persist across populations. Our results suggest that the most accurate genomic predictions are achieved when phenotypes from all populations are combined in one training set, while for more diverged populations a higher marker density is required.

MeSH terms

  • Animals
  • Breeding
  • Cattle
  • Genetic Markers / genetics
  • Genomics*
  • Linkage Disequilibrium
  • Models, Genetic
  • Quantitative Trait Loci
  • Reproducibility of Results

Substances

  • Genetic Markers