Building a Calibration Set for Genomic Prediction, Characteristics to Be Considered, and Optimization Approaches

Simon Rio; Alain Charcosset; Tristan Mary-Huard; Laurence Moreau; Renaud Rincent

doi:10.1007/978-1-0716-2205-6_3

Building a Calibration Set for Genomic Prediction, Characteristics to Be Considered, and Optimization Approaches

Methods Mol Biol. 2022:2467:77-112. doi: 10.1007/978-1-0716-2205-6_3.

Authors

Simon Rio^{1

2}, Alain Charcosset³, Tristan Mary-Huard³, Laurence Moreau⁴, Renaud Rincent³

Affiliations

¹ CIRAD, UMR AGAP Institut, Montpellier, France.
² UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France.
³ GQE-Le Moulon, INRAE, University Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France.
⁴ GQE-Le Moulon, INRAE, University Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France. laurence.moreau@inrae.fr.

PMID: 35451773
DOI: 10.1007/978-1-0716-2205-6_3

Abstract

The efficiency of genomic selection strongly depends on the prediction accuracy of the genetic merit of candidates. Numerous papers have shown that the composition of the calibration set is a key contributor to prediction accuracy. A poorly defined calibration set can result in low accuracies, whereas an optimized one can considerably increase accuracy compared to random sampling, for a same size. Alternatively, optimizing the calibration set can be a way of decreasing the costs of phenotyping by enabling similar levels of accuracy compared to random sampling but with fewer phenotypic units. We present here the different factors that have to be considered when designing a calibration set, and review the different criteria proposed in the literature. We classified these criteria into two groups: model-free criteria based on relatedness, and criteria derived from the linear mixed model. We introduce criteria targeting specific prediction objectives including the prediction of highly diverse panels, biparental families, or hybrids. We also review different ways of updating the calibration set, and different procedures for optimizing phenotyping experimental designs.

Keywords: CDmean; Calibration population; Genomic selection; Optimization; PEVmean; Prediction accuracy.

MeSH terms

Calibration
Genome, Plant*
Genomics* / methods
Genotype
Humans
Models, Genetic
Phenotype
Polymorphism, Single Nucleotide