Training set design in genomic prediction with multiple biparental families

Plant Genome. 2021 Nov;14(3):e20124. doi: 10.1002/tpg2.20124. Epub 2021 Jul 24.

Abstract

Genomic selection is a powerful tool to reduce the cycle length and enhance the genetic gain of complex traits in plant breeding. However, questions remain about the optimum design and composition of the training set. In this study, we used 944 soybean [Glycine max (L.) Merr.] recombinant inbred lines from eight families derived through a partial-diallel mating design among five parental lines. The cross-validated prediction accuracies for the six traits seed yield, 1,000-seed weight, protein yield, plant height, protein content, and oil content were high, ranging from 0.79 to 0.87. We investigated among-family predictions, making use of the special mating design with different degrees of relatedness among families. Generally, the prediction accuracy decreased from full-sibs to half-sib families to unrelated families. However, half-sib and unrelated families also showed substantial variation in their prediction accuracy for a given family, which appeared to be caused at least in part by the shared segregation of quantitative trait loci in both the training and prediction sets. Combining several half-sib families in composite training sets generally led to an increase in the prediction accuracy compared with the best family alone. The prediction accuracy increased with the size of the training set, but for comparable prediction accuracy, substantially more half-sibs were required than full-sibs. Collectively, our results highlight the potential of genomic selection for soybean breeding and, in a broader context, illustrate the importance of the targeted design of the training set.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genome, Plant*
  • Genomics / methods
  • Humans
  • Phenotype
  • Plant Breeding*
  • Quantitative Trait Loci