Genomic selection using principal component regression

Caroline Du; Julong Wei; Shibo Wang; Zhenyu Jia

doi:10.1038/s41437-018-0078-x

Genomic selection using principal component regression

Heredity (Edinb). 2018 Jul;121(1):12-23. doi: 10.1038/s41437-018-0078-x. Epub 2018 May 1.

Authors

Caroline Du¹, Julong Wei^{1

2}, Shibo Wang¹, Zhenyu Jia³

Affiliations

¹ Department of Botany and Plant Sciences, University of California, Riverside, CA, 92521, USA.
² College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, Jiangsu, China.
³ Department of Botany and Plant Sciences, University of California, Riverside, CA, 92521, USA. arthur.jia@ucr.edu.

Abstract

Many statistical methods are available for genomic selection (GS) through which genetic values of quantitative traits are predicted for plants and animals using whole-genome SNP data. A large number of predictors with much fewer subjects become a major computational challenge in GS. Principal components regression (PCR) and its derivative, i.e., partial least squares regression (PLSR), provide a solution through dimensionality reduction. In this study, we show that PCR can perform better than PLSR in cross validation. PCR often requires extracting more components to achieve the maximum predictive ability than PLSR and thus may be associated with a higher computational cost. However, application of the HAT method (a strategy of describing the relationship between the fitted and observed response variables with a hat matrix) to PCR circumvents conventional cross validation in testing predictive ability, resulting in substantially improved computational efficiency over PLSR where cross validation is mandatory. Advantages of PCR over PLSR are illustrated with a simulated trait of a hypothetical population and four agronomical traits of a rice population. The benefit of using PCR in genomic selection is further demonstrated in an effort to predict 1000 metabolomic traits and 24,973 transcriptomic traits in the same rice population.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Computer Simulation
Genomics* / methods
Models, Genetic*
Oryza / genetics
Phenotype
Principal Component Analysis*
Quantitative Trait Loci
Quantitative Trait, Heritable
Regression Analysis*
Selection, Genetic*