Efficient Use of Historical Data for Genomic Selection: A Case Study of Stem Rust Resistance in Wheat

Plant Genome. 2015 Mar;8(1):eplantgenome2014.09.0046. doi: 10.3835/plantgenome2014.09.0046.

Abstract

Genomic selection (GS) is a methodology that can improve crop breeding efficiency. To implement GS, a training population (TP) with phenotypic and genotypic data is required to train a statistical model used to predict genotyped selection candidates (SCs). A key factor impacting prediction accuracy is the relationship between the TP and the SCs. This study used empirical data for quantitative adult plant resistance to stem rust of wheat (Triticum aestivum L.) to investigate the utility of a historical TP (TPH ) compared with a population-specific TP (TPPS ), the potential for TPH optimization, and the utility of TPH data when close relative data is available for training. We found that, depending on the population size, a TPPS was 1.5 to 4.4 times more accurate than a TPH , and TPH optimization based on the mean of the generalized coefficient of determination or prediction error variance enabled the selection of subsets that led to significantly higher accuracy than randomly selected subsets. Retaining historical data when data on close relatives were available lead to a 11.9% increase in accuracy, at best, and a 12% decrease in accuracy, at worst, depending on the heritability. We conclude that historical data could be used successfully to initiate a GS program, especially if the dataset is very large and of high heritability. Training population optimization would be useful for the identification of TPH subsets to phenotype additional traits. However, after model updating, discarding historical data may be warranted. More studies are needed to determine if these observations represent general trends.