Enhancing genome-enabled prediction by bagging genomic BLUP

PLoS One. 2014 Apr 10;9(4):e91693. doi: 10.1371/journal.pone.0091693. eCollection 2014.

Abstract

We examined whether or not the predictive ability of genomic best linear unbiased prediction (GBLUP) could be improved via a resampling method used in machine learning: bootstrap aggregating sampling ("bagging"). In theory, bagging can be useful when the predictor has large variance or when the number of markers is much larger than sample size, preventing effective regularization. After presenting a brief review of GBLUP, bagging was adapted to the context of GBLUP, both at the level of the genetic signal and of marker effects. The performance of bagging was evaluated with four simulated case studies including known or unknown quantitative trait loci, and an application was made to real data on grain yield in wheat planted in four environments. A metric aimed to quantify candidate-specific cross-validation uncertainty was proposed and assessed; as expected, model derived theoretical reliabilities bore no relationship with cross-validation accuracy. It was found that bagging can ameliorate predictive performance of GBLUP and make it more robust against over-fitting. Seemingly, 25-50 bootstrap samples was enough to attain reasonable predictions as well as stable measures of individual predictive mean squared errors.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Computer Simulation
  • Genetic Markers
  • Genome*
  • Genomics / methods*
  • Genotype
  • Least-Squares Analysis
  • Linear Models
  • Models, Genetic
  • Phenotype
  • Predictive Value of Tests
  • Quantitative Trait Loci*
  • Quantitative Trait, Heritable
  • Reproducibility of Results
  • Sample Size
  • Triticum / genetics

Substances

  • Genetic Markers

Grants and funding

Research was partially supported by the Federal Ministry of Education and Research (BMBF, Germany) within the AgroClustEr 15 Synbreed-“Synergistic plan and animal breeding” (FKZ 03115528A), by a U.S. Department of Agriculture Hatch Grant (142- PRJ63CV) to DG, and by the Wisconsin Agriculture Experiment Station. The funders had no role on study design, data collection and analysis, decision to publish, or preparation of the manuscript.