The effect of uncertainty in patient classification on diagnostic performance estimations

PLoS One. 2019 May 22;14(5):e0217146. doi: 10.1371/journal.pone.0217146. eCollection 2019.

Abstract

Background: The performance of a new diagnostic test is typically evaluated against a comparator which is assumed to correspond closely to some true state of interest. Judgments about the new test's performance are based on the differences between the outputs of the test and comparator. It is commonly assumed that a small amount of uncertainty in the comparator's classifications will negligibly affect the measured performance of a diagnostic test.

Methods: Simulated datasets were generated to represent typical diagnostic scenarios. Comparator noise was introduced in the form of random misclassifications, and the effect on the apparent performance of the diagnostic test was determined. An actual dataset from a clinical trial on a new diagnostic test for sepsis was also analyzed.

Results: We demonstrate that as little as 5% misclassification of patients by the comparator can be enough to statistically invalidate performance estimates such as sensitivity, specificity and area under the receiver operating characteristic curve, if this uncertainty is not measured and taken into account. This distortion effect is found to increase non-linearly with comparator uncertainty, under some common diagnostic scenarios. For clinical populations exhibiting high degrees of classification uncertainty, failure to measure and account for this effect will introduce significant risks of drawing false conclusions. The effect of classification uncertainty is magnified further for high performing tests that would otherwise reach near-perfection in diagnostic evaluation trials. A requirement of very high diagnostic performance for clinical adoption, such as a 99% sensitivity, can be rendered nearly unachievable even for a perfect test, if the comparator diagnosis contains even small amounts of uncertainty. This paper and an accompanying online simulation tool demonstrate the effect of classification uncertainty on the apparent performance of tests across a range of typical diagnostic scenarios. Both simulated and real datasets are used to show the degradation of apparent test performance as comparator uncertainty increases.

Conclusions: Overall, a 5% or greater misclassification rate by the comparator can lead to significant underestimation of true test performance. An online simulation tool allows researchers to explore this effect using their own trial parameters (https://imperfect-gold-standard.shinyapps.io/classification-noise/) and the source code is freely available (https://github.com/ksny/Imperfect-Gold-Standard).

MeSH terms

  • Computer Simulation
  • Diagnostic Tests, Routine / standards*
  • Diagnostic Tests, Routine / statistics & numerical data*
  • Humans
  • Models, Statistical*
  • ROC Curve
  • Sepsis / classification*
  • Sepsis / diagnosis*
  • Uncertainty

Grants and funding

Immunexpress provided salary support for Leo C. McHugh and Thomas D. Yager over the course of this work. The funding organization did not play a role in the study design, data collection and analysis, or choice of content in the manuscript, and only provided financial support in the form of these authors' salaries. The specific roles of these authors are articulated in the ‘author contributions’ section of the manuscript.