Cross-validation of component models: a critical look at current methods

R Bro; K Kjeldahl; A K Smilde; H A L Kiers

doi:10.1007/s00216-007-1790-1

Cross-validation of component models: a critical look at current methods

Anal Bioanal Chem. 2008 Mar;390(5):1241-51. doi: 10.1007/s00216-007-1790-1. Epub 2008 Jan 24.

Authors

R Bro¹, K Kjeldahl, A K Smilde, H A L Kiers

Affiliation

¹ Chemometrics Group, Faculty of Life Sciences, University of Copenhagen, 1958, Frederiksberg C, Denmark. rb@life.ku.dk

PMID: 18214448
DOI: 10.1007/s00216-007-1790-1

Abstract

In regression, cross-validation is an effective and popular approach that is used to decide, for example, the number of underlying features, and to estimate the average prediction error. The basic principle of cross-validation is to leave out part of the data, build a model, and then predict the left-out samples. While such an approach can also be envisioned for component models such as principal component analysis (PCA), most current implementations do not comply with the essential requirement that the predictions should be independent of the entity being predicted. Further, these methods have not been properly reviewed in the literature. In this paper, we review the most commonly used generic PCA cross-validation schemes and assess how well they work in various scenarios.

MeSH terms

Computer Simulation
Models, Biological
Principal Component Analysis / methods*
Principal Component Analysis / standards*
Reproducibility of Results