Estimating error rates in the classification of paired organs

Stat Med. 2008 Sep 30;27(22):4515-31. doi: 10.1002/sim.3310.

Abstract

Clinical data from paired organs present a dependence structure that has to be considered when making statistical inference or evaluating classification rules with resampling-based techniques (bootstrap, cross-validation). We introduce a paired cross-validation approach for the estimation of misclassification error rates in the classification of data from paired organs. The dependence structure of the sample is honored by subject-level cross-validation. Theoretical considerations as well as a case-control study on glaucoma diagnosis and a simulation study show that the variance of the paired cross-validation estimator is considerably lower than in traditional cross-validation error estimation on one randomly selected eye per subject. The actual variance reduction is mainly controlled by the contribution of differential misclassification between both eyes to the overall error rate. By contrast, 'ad hoc' cross-validation ignoring the autocorrelation of paired organs leads to biased error estimates. Using the double-bagging technique, we also show that classification accuracy can be improved by using information from both eyes in training machine-learning classifiers. In glaucoma detection, the reduction in misclassification error rates by training data from both eyes is equivalent to an increase in the sample size by one-third to one-half, which is an important achievement in clinical studies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bias
  • Case-Control Studies
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Diagnostic Errors*
  • Glaucoma / diagnosis
  • Humans
  • Middle Aged
  • Registries
  • Reproducibility of Results
  • Sensitivity and Specificity