Are evaluations in simulated medical encounters reliable among rater types? A comparison between standardized patient and outside observer ratings of OSCEs

Easton N Wollney; Taylor S Vasquez; Carolyn Stalvey; Julia Close; Merry Jennifer Markham; Lynne E Meyer; Lou Ann Cooper; Carma L Bylund

doi:10.1016/j.pecinn.2023.100125

Are evaluations in simulated medical encounters reliable among rater types? A comparison between standardized patient and outside observer ratings of OSCEs

PEC Innov. 2023 Jan 29:2:100125. doi: 10.1016/j.pecinn.2023.100125. eCollection 2023 Dec.

Authors

Easton N Wollney¹, Taylor S Vasquez², Carolyn Stalvey³, Julia Close⁴, Merry Jennifer Markham⁴, Lynne E Meyer⁵, Lou Ann Cooper⁶, Carma L Bylund¹

Affiliations

¹ Dept. of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.
² College of Journalism and Communications, University of Florida, Gainesville, FL, USA.
³ Dept. of Internal Medicine, College of Medicine, University of Florida, Gainesville, FL USA.
⁴ Dept. of Hematology and Oncology, College of Medicine, University of Florida, Gainesville, FL, USA.
⁵ Graduate Medical Education, College of Medicine, University of Florida, Gainesville, FL, USA.
⁶ Dept. of Medical Education, College of Medicine, University of Florida, Gainesville, FL, USA.

Abstract

Objective: By analyzing Objective Structured Clinical Examination (OSCE) evaluations of first-year interns' communication with standardized patients (SP), our study aimed to examine the differences between ratings of SPs and a set of outside observers with training in healthcare communication.

Methods: Immediately following completion of OSCEs, SPs evaluated interns' communication skills using 30 items. Later, two observers independently coded video recordings using the same items. We conducted two-tailed t-tests to examine differences between SP and observers' ratings.

Results: Rater scores differed significantly on 21 items (p < .05), with 20 of the 21 differences due to higher SP in-person evaluation scores. Items most divergent between SPs and observers included items related to empathic communication and nonverbal communication.

Conclusion: Differences between SP and observer ratings should be further investigated to determine if additional rater training is needed or if a revised evaluation measure is needed. Educators may benefit from adjusting evaluation criteria to decrease the number of items raters must complete and may do so by encompassing more global questions regarding various criteria. Furthermore, evaluation measures may be strengthened by undergoing reliability and validity testing.

Innovation: This study highlights the strengths and limitations to rater types (observers or SPs), as well as evaluation methods (recorded or in-person).

Keywords: GME needs assessments; Graduate medical education; Interpreter devices; OSCE; Patient-clinician communication.