Assessment of reliability in the clinical evaluation of depressive symptoms among multiple investigators in a multicenter clinical trial

Psychiatry Res. 2001 Jun 1;102(2):163-73. doi: 10.1016/s0165-1781(01)00249-9.

Abstract

The objective of this work was to determine the severity of depressive symptoms when multiple clinical examiners evaluate a single subject, as preparatory to their participation as evaluators in a clinical trial. Using the 17-item Hamilton Depression Rating Scale (HDRS), 37 psychiatrists independently assessed the videotape of a patient with symptoms of depression. A new measure for the detection of multiple examiners not in consensus (DOMENIC) was used to identify scale items with low reliability and raters with low inter-rater reliability, from among the remaining raters. Overall inter-rater agreement on the full HDRS was 'excellent' (97%). All raters but one showed adequate agreement both on individual items and on total scores. Two of the 17 HDRS symptomatology items had unacceptable levels of inter-rater scoring variability (<70% agreement). The use of DOMENIC allows for the detection of items of low inter-rater reliability and identification of raters that deviate from the group's ratings prior to the beginning of a clinical trial.

Publication types

  • Clinical Trial
  • Multicenter Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adult
  • Depression / diagnosis*
  • Depression / epidemiology
  • Female
  • Humans
  • Observer Variation
  • Psychometrics / statistics & numerical data
  • Reproducibility of Results
  • Severity of Illness Index