(Quasi) multitask support vector regression with heuristic hyperparameter optimization for whole-genome prediction of complex traits: a case study with carcass traits in broilers

G3 (Bethesda). 2023 Aug 9;13(8):jkad109. doi: 10.1093/g3journal/jkad109.

Abstract

This study investigates nonlinear kernels for multitrait (MT) genomic prediction using support vector regression (SVR) models. We assessed the predictive ability delivered by single-trait (ST) and MT models for 2 carcass traits (CT1 and CT2) measured in purebred broiler chickens. The MT models also included information on indicator traits measured in vivo [Growth and feed efficiency trait (FE)]. We proposed an approach termed (quasi) multitask SVR (QMTSVR), with hyperparameter optimization performed via genetic algorithm. ST and MT Bayesian shrinkage and variable selection models [genomic best linear unbiased predictor (GBLUP), BayesC (BC), and reproducing kernel Hilbert space (RKHS) regression] were employed as benchmarks. MT models were trained using 2 validation designs (CV1 and CV2), which differ if the information on secondary traits is available in the testing set. Models' predictive ability was assessed with prediction accuracy (ACC; i.e. the correlation between predicted and observed values, divided by the square root of phenotype accuracy), standardized root-mean-squared error (RMSE*), and inflation factor (b). To account for potential bias in CV2-style predictions, we also computed a parametric estimate of accuracy (ACCpar). Predictive ability metrics varied according to trait, model, and validation design (CV1 or CV2), ranging from 0.71 to 0.84 for ACC, 0.78 to 0.92 for RMSE*, and between 0.82 and 1.34 for b. The highest ACC and smallest RMSE* were achieved with QMTSVR-CV2 in both traits. We observed that for CT1, model/validation design selection was sensitive to the choice of accuracy metric (ACC or ACCpar). Nonetheless, the higher predictive accuracy of QMTSVR over MTGBLUP and MTBC was replicated across accuracy metrics, besides the similar performance between the proposed method and the MTRKHS model. Results showed that the proposed approach is competitive with conventional MT Bayesian regression models using either Gaussian or spike-slab multivariate priors.

Keywords: GenPred; Genomic Prediction; Shared Data Resource; genetic algorithm; kernel methods; machine learning; multitrait models.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Bayes Theorem
  • Chickens* / genetics
  • Genotype
  • Heuristics
  • Models, Genetic
  • Multifactorial Inheritance*
  • Phenotype

Associated data

  • figshare/10.6084/m9.figshare.21538350