Education research: Bias and poor interrater reliability in evaluating the neurology clinical skills examination

Neurology. 2009 Sep 15;73(11):904-8. doi: 10.1212/WNL.0b013e3181b35212. Epub 2009 Jul 15.

Abstract

Objective: The American Board of Psychiatry and Neurology (ABPN) has recently replaced the traditional, centralized oral examination with the locally administered Neurology Clinical Skills Examination (NEX). The ABPN postulated the experience with the NEX would be similar to the Mini-Clinical Evaluation Exercise, a reliable and valid assessment tool. The reliability and validity of the NEX has not been established.

Methods: NEX encounters were videotaped at 4 neurology programs. Local faculty and ABPN examiners graded the encounters using 2 different evaluation forms: an ABPN form and one with a contracted rating scale. Some NEX encounters were purposely failed by residents. Cohen's kappa and intraclass correlation coefficients (ICC) were calculated for local vs ABPN examiners.

Results: Ninety-eight videotaped NEX encounters of 32 residents were evaluated by 20 local faculty evaluators and 18 ABPN examiners. The interrater reliability for a determination of pass vs fail for each encounter was poor (kappa 0.32; 95% confidence interval [CI] = 0.11, 0.53). ICC between local faculty and ABPN examiners for each performance rating on the ABPN NEX form was poor to moderate (ICC range 0.14-0.44), and did not improve with the contracted rating form (ICC range 0.09-0.36). ABPN examiners were more likely than local examiners to fail residents.

Conclusions: There is poor interrater reliability between local faculty and American Board of Psychiatry and Neurology examiners. A bias was detected for favorable assessment locally, which is concerning for the validity of the examination. Further study is needed to assess whether training can improve interrater reliability and offset bias.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bias*
  • Clinical Competence / standards*
  • Educational Measurement* / methods
  • Educational Measurement* / standards
  • Evaluation Studies as Topic
  • Female
  • Humans
  • Internship and Residency / standards*
  • Male
  • Neurology / education*
  • Psychiatry / education
  • Reproducibility of Results
  • Videotape Recording