Accuracy of Genomic Prediction in Synthetic Populations Depending on the Number of Parents, Relatedness, and Ancestral Linkage Disequilibrium

Pascal Schopp; Dominik Müller; Frank Technow; Albrecht E Melchinger

doi:10.1534/genetics.116.193243

Accuracy of Genomic Prediction in Synthetic Populations Depending on the Number of Parents, Relatedness, and Ancestral Linkage Disequilibrium

Genetics. 2017 Jan;205(1):441-454. doi: 10.1534/genetics.116.193243. Epub 2016 Nov 9.

Authors

Pascal Schopp¹, Dominik Müller¹, Frank Technow¹, Albrecht E Melchinger²

Affiliations

¹ Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70599 Stuttgart, Germany.
² Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70599 Stuttgart, Germany melchinger@uni-hohenheim.de.

Abstract

Synthetics play an important role in quantitative genetic research and plant breeding, but few studies have investigated the application of genomic prediction (GP) to these populations. Synthetics are generated by intermating a small number of parents ([Formula: see text] and thereby possess unique genetic properties, which make them especially suited for systematic investigations of factors contributing to the accuracy of GP. We generated synthetics in silico from [Formula: see text]2 to 32 maize (Zea mays L.) lines taken from an ancestral population with either short- or long-range linkage disequilibrium (LD). In eight scenarios differing in relatedness of the training and prediction sets and in the types of data used to calculate the relationship matrix (QTL, SNPs, tag markers, and pedigree), we investigated the prediction accuracy (PA) of Genomic best linear unbiased prediction (GBLUP) and analyzed contributions from pedigree relationships captured by SNP markers, as well as from cosegregation and ancestral LD between QTL and SNPs. The effects of training set size [Formula: see text] and marker density were also studied. Sampling few parents ([Formula: see text]) generates substantial sample LD that carries over into synthetics through cosegregation of alleles at linked loci. For fixed [Formula: see text], [Formula: see text] influences PA most strongly. If the training and prediction set are related, using [Formula: see text] parents yields high PA regardless of ancestral LD because SNPs capture pedigree relationships and Mendelian sampling through cosegregation. As [Formula: see text] increases, ancestral LD contributes more information, while other factors contribute less due to lower frequencies of closely related individuals. For unrelated prediction sets, only ancestral LD contributes information and accuracies were poor and highly variable for [Formula: see text] due to large sample LD. For large [Formula: see text], achieving moderate accuracy requires large [Formula: see text], long-range ancestral LD, and high marker density. Our approach for analyzing PA in synthetics provides new insights into the prospects of GP for many types of source populations encountered in plant breeding.

Keywords: GBLUP; GenPred; Shared data resource; genetic relationships; genomic prediction; genomic selection; linkage disequilibrium; synthetic populations.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Alleles
Computer Simulation
Forecasting
Genomics / methods*
Linkage Disequilibrium*
Models, Genetic*
Models, Statistical
Pedigree
Phenotype
Plant Breeding
Polymorphism, Single Nucleotide
Quantitative Trait Loci
Zea mays / genetics