Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods

PLoS One. 2015 Oct 6;10(10):e0138903. doi: 10.1371/journal.pone.0138903. eCollection 2015.

Abstract

Accurate prediction of complex traits based on whole-genome data is a computational problem of paramount importance, particularly to plant and animal breeders. However, the number of genetic markers is typically orders of magnitude larger than the number of samples (p >> n), amongst other challenges. We assessed the effectiveness of a diverse set of state-of-the-art methods on publicly accessible real data. The most surprising finding was that approaches with feature selection performed better than others on average, in contrast to the expectation in the community that variable selection is mostly ineffective, i.e. that it does not improve accuracy of prediction, in spite of p >> n. We observed superior performance despite a somewhat simplistic approach to variable selection, possibly suggesting an inherent robustness. This bodes well in general since the variable selection methods usually improve interpretability without loss of prediction power. Apart from identifying a set of benchmark data sets (including one simulated data), we also discuss the performance analysis for each data set in terms of the input characteristics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Genetic Markers / genetics*
  • Genome / genetics
  • Models, Genetic*
  • Quantitative Trait Loci / genetics*
  • Swine
  • Zea mays / genetics

Substances

  • Genetic Markers

Grants and funding

The authors received no specific funding for this work. LP, DCH, IR, DH, ACL, and PK are employed by IBM T. J. Watson Research. ST and ZK are employed by Limagrain Europe. IBM T. J. Watson Research provided support in the form of salaries for authors LP, DCH, IR, DH, ACL, and PK, and Limagrain provided salaries for ST and ZK, but neither IBM T. J. Watson Research nor Limagrain had any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the “author contributions” section.