Optimizing Training Population Size and Genotyping Strategy for Genomic Prediction Using Association Study Results and Pedigree Information. A Case of Study in Advanced Wheat Breeding Lines

PLoS One. 2017 Jan 12;12(1):e0169606. doi: 10.1371/journal.pone.0169606. eCollection 2017.

Abstract

Wheat breeding programs generate a large amount of variation which cannot be completely explored because of limited phenotyping throughput. Genomic prediction (GP) has been proposed as a new tool which provides breeding values estimations without the need of phenotyping all the material produced but only a subset of it named training population (TP). However, genotyping of all the accessions under analysis is needed and, therefore, optimizing TP dimension and genotyping strategy is pivotal to implement GP in commercial breeding schemes. Here, we explored the optimum TP size and we integrated pedigree records and genome wide association studies (GWAS) results to optimize the genotyping strategy. A total of 988 advanced wheat breeding lines were genotyped with the Illumina 15K SNPs wheat chip and phenotyped across several years and locations for yield, lodging, and starch content. Cross-validation using the largest possible TP size and all the SNPs available after editing (~11k), yielded predictive abilities (rGP) ranging between 0.5-0.6. In order to explore the Training population size, rGP were computed using progressively smaller TP. These exercises showed that TP of around 700 lines were enough to yield the highest observed rGP. Moreover, rGP were calculated by randomly reducing the SNPs number. This showed that around 1K markers were enough to reach the highest observed rGP. GWAS was used to identify markers associated with the traits analyzed. A GWAS-based selection of SNPs resulted in increased rGP when compared with random selection and few hundreds SNPs were sufficient to obtain the highest observed rGP. For each of these scenarios, advantages of adding the pedigree information were shown. Our results indicate that moderate TP sizes were enough to yield high rGP and that pedigree information and GWAS results can be used to greatly optimize the genotyping strategy.

MeSH terms

  • Genome, Plant*
  • Genome-Wide Association Study
  • Genotyping Techniques / methods*
  • Plant Breeding*
  • Polymorphism, Single Nucleotide*
  • Triticum / genetics*

Grants and funding

This work has been funded by the Danish Ministry for Food, Agriculture and Fisheries under the “Program for Green Development and Demonstration” (Grønt Udviklingsog demonstrations program – GUDP – Award Number: 34009-13-0607). This funder provided support in the form of salaries for authors FC, LLJ, and JJ, and half of the salary of JO, JRA, and AJ. Moreover GUDP financed half of the cost associated with the data production, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of authors are articulated in the ‘author contributions’ section. Additional funding was provided by the commercial partner Nordic Seed A/S in the form of half of the salaries for JO, JRA, and AJ, as well as for the production of phenotypic and genotypic data. Authors funded by Nordic Seed A/S participated in decisions on study design, data collection and analysis, and preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.