Reader studies for validation of CAD systems

Neural Netw. 2008 Mar-Apr;21(2-3):387-97. doi: 10.1016/j.neunet.2007.12.013. Epub 2007 Dec 23.

Abstract

Evaluation of computational intelligence (CI) systems designed to improve the performance of a human operator is complicated by the need to include the effect of human variability. In this paper we consider human (reader) variability in the context of medical imaging computer-assisted diagnosis (CAD) systems, and we outline how to compare the detection performance of readers with and without the CAD. An effective and statistically powerful comparison can be accomplished with a receiver operating characteristic (ROC) experiment, summarized by the reader-averaged area under the ROC curve (AUC). The comparison requires sophisticated yet well-developed methods for multi-reader multi-case (MRMC) variance analysis. MRMC variance analysis accounts for random readers, random cases, and correlations in the experiment. In this paper, we extend the methods available for estimating this variability. Specifically, we present a method that can treat arbitrary study designs. Most methods treat only the fully-crossed study design, where every reader reads every case in two experimental conditions. We demonstrate our method with a computer simulation, and we assess the statistical power of a variety of study designs.

MeSH terms

  • Biomedical Research*
  • Computer Simulation
  • Diagnosis, Computer-Assisted*
  • Diagnostic Imaging
  • Disease
  • Humans
  • Monte Carlo Method
  • Neural Networks, Computer*
  • ROC Curve*
  • Reproducibility of Results
  • Species Specificity