TrainSel: An R Package for Selection of Training Populations

Front Genet. 2021 May 7:12:655287. doi: 10.3389/fgene.2021.655287. eCollection 2021.

Abstract

A major barrier to the wider use of supervised learning in emerging applications, such as genomic selection, is the lack of sufficient and representative labeled data to train prediction models. The amount and quality of labeled training data in many applications is usually limited and therefore careful selection of the training examples to be labeled can be useful for improving the accuracies in predictive learning tasks. In this paper, we present an R package, TrainSel, which provides flexible, efficient, and easy-to-use tools that can be used for the selection of training populations (STP). We illustrate its use, performance, and potentials in four different supervised learning applications within and outside of the plant breeding area.

Keywords: genomic prediction; genomic selection; image classification; machine learning; mixed models; multi-objective optimization; training optimization.

Associated data

  • Dryad/10.5061/dryad.461nc