The pursuit of genetic gain in agricultural crops through the application of machine-learning to genomic prediction

Front Genet. 2023 Aug 2:14:1186782. doi: 10.3389/fgene.2023.1186782. eCollection 2023.

Abstract

Current practice in agriculture applies genomic prediction to assist crop breeding in the analysis of genetic marker data. Genomic selection methods typically use linear mixed models, but using machine-learning may provide further potential for improved selection accuracy, or may provide additional information. Here we describe SelectML, an automated pipeline for testing and comparing the performance of a range of linear mixed model and machine-learning-based genomic selection methods. We demonstrate the use of SelectML on an in silico-generated marker dataset which simulated a randomly-sampled (mixed) and an unevenly-sampled (unbalanced) population, comparing the relative performance of various methods included in SelectML on the two datasets. Although machine-learning based methods performed similarly overall to linear mixed models, they performed worse on the mixed dataset and marginally better on the unbalanced dataset, being more affected than linear mixed models by the imposed sampling bias. SelectML can assist in the training, comparison, and selection of genomic selection models, and is available from https://github.com/darcyabjones/selectml.

Keywords: crop improvement; genetic gain; genomic prediction; linear-mixed models; machine-learning.

Grants and funding

This research was funded by the Grains Research and Development Corporation (grant: CUR2002–001RTX) of the Australian Government.