Genomic selection using principal component regression

Heredity (Edinb). 2018 Jul;121(1):12-23. doi: 10.1038/s41437-018-0078-x. Epub 2018 May 1.

Abstract

Many statistical methods are available for genomic selection (GS) through which genetic values of quantitative traits are predicted for plants and animals using whole-genome SNP data. A large number of predictors with much fewer subjects become a major computational challenge in GS. Principal components regression (PCR) and its derivative, i.e., partial least squares regression (PLSR), provide a solution through dimensionality reduction. In this study, we show that PCR can perform better than PLSR in cross validation. PCR often requires extracting more components to achieve the maximum predictive ability than PLSR and thus may be associated with a higher computational cost. However, application of the HAT method (a strategy of describing the relationship between the fitted and observed response variables with a hat matrix) to PCR circumvents conventional cross validation in testing predictive ability, resulting in substantially improved computational efficiency over PLSR where cross validation is mandatory. Advantages of PCR over PLSR are illustrated with a simulated trait of a hypothetical population and four agronomical traits of a rice population. The benefit of using PCR in genomic selection is further demonstrated in an effort to predict 1000 metabolomic traits and 24,973 transcriptomic traits in the same rice population.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computer Simulation
  • Genomics* / methods
  • Models, Genetic*
  • Oryza / genetics
  • Phenotype
  • Principal Component Analysis*
  • Quantitative Trait Loci
  • Quantitative Trait, Heritable
  • Regression Analysis*
  • Selection, Genetic*