Ultrasound methods to distinguish between malignant and benign adnexal masses in the hands of examiners with different levels of experience

Ultrasound Obstet Gynecol. 2009 Oct;34(4):454-61. doi: 10.1002/uog.6443.

Abstract

Objectives: To determine the effect of an ultrasound training course on the performance of pattern recognition when used by less experienced examiners and to compare the performance of pattern recognition, a logistic regression model and a scoring system to estimate the risk of malignancy between examiners with different levels of experience.

Methods: Using ultrasound images of selected adnexal masses, two trainees classified the masses as benign or malignant by using pattern recognition both before and after they had attended a theoretical gynecological ultrasound course. They also classified the masses by using a logistic regression model and a scoring system, but only after they had attended the course. The performance of these three methods when they were used by the trainees was then compared with that when they were used by experts.

Results: One hundred and sixty-five adnexal masses were included, of which 42% were malignant (21% invasive tumors and 21% borderline tumors). The area under the receiver-operating characteristics curve of pattern recognition when used by the trainees was similar before and after they had attended the course. Training decreased sensitivity (84% vs. 70% for Trainee 1, P = 0.004; 70% vs. 61% for Trainee 2, P = 0.058) and increased specificity (77% vs. 92% for Trainee 1, P = 0.001; 89% vs. 95% for Trainee 2, P = 0.058). The performance of pattern recognition was poorer in the hands of the trainees than in the hands of the experts. The sensitivities of the logistic regression model were 70% and 54% for the trainees vs. 83% for an expert (P = 0.020 and < 0.001, respectively) and the specificities were 84% and 94% vs. 89% (P = 0.25 and 0.59, respectively). The sensitivities of the scoring system were 59% and 54% for the trainees vs. 75% for the expert (P = 0.002 and < 0.001, respectively), and the specificities were 90% and 93% vs. 85% (P = 0.103 and 0.008, respectively).

Conclusion: Theoretical ultrasound teaching did not seem to improve the performance of pattern recognition in the hands of trainees. A logistic regression model and a scoring system to classify adnexal masses as benign or malignant perform less well when they were used by inexperienced examiners than when used by an expert. Before using a model or a scoring system, experience and/or proper training are likely to be of paramount importance if diagnostic performance is to be optimized.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adnexal Diseases / diagnostic imaging*
  • Clinical Competence / standards*
  • Female
  • Gynecology / standards
  • Humans
  • Male
  • Obstetrics / standards
  • Ovarian Neoplasms / diagnostic imaging
  • Pattern Recognition, Automated / standards*
  • ROC Curve
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Ultrasonography, Doppler, Color