Psychometric Testing of the Debriefing Assessment for Simulation in Healthcare (DASH) for Trainee-led, In Situ Simulations in the Pediatric Emergency Department Context

Shiva Zargham; Amy Hanson; Megan Laniewicz; Mary Sandquist; David O Kessler; Gregory E Gilbert; Aaron W Calhoun

doi:10.1002/aet2.10482

Psychometric Testing of the Debriefing Assessment for Simulation in Healthcare (DASH) for Trainee-led, In Situ Simulations in the Pediatric Emergency Department Context

AEM Educ Train. 2020 Jun 17;5(2):e10482. doi: 10.1002/aet2.10482. eCollection 2021 Apr.

Authors

Shiva Zargham¹, Amy Hanson¹, Megan Laniewicz¹, Mary Sandquist¹, David O Kessler², Gregory E Gilbert³, Aaron W Calhoun¹

Affiliations

¹ Department of Pediatrics University of Louisville School of Medicine Louisville KY USA.
² and the Department of Emergency Medicine Columbia University Vagelos College of Physicians & Surgeons New York NY USA.
³ and the ΣgmaΣtats Consulting, LLC Charleston SC USA.

Abstract

Objectives: Effective trainee-led debriefing after critical events in the pediatric emergency department has potential to improve patient care, but debriefing assessments for this context have not been developed. This study gathers preliminary validity and reliability evidence for the Debriefing Assessment for Simulation in Healthcare (DASH) as an assessment of trainee-led post-critical event debriefing.

Methods: Eight fellows led teams in three simulated critical events, each followed by a video-recorded discussion of performance mimicking impromptu debriefings occurring after real clinical events. Three raters assessed the recorded debriefings using the DASH, and their feedback was collated. Data were analyzed using generalizability theory, Gwet's AC₂, intraclass correlation coefficient (ICC), and coefficient alpha. Validity was examined using Messick's framework.

Results: The DASH instrument had relatively low traditional inter-rater reliability (Gwet's AC₂ = 0.24, single-rater ICC range = 0.16-0.35), with 30% fellow, 19% rater, and 23% rater by fellow variance. DASH generalizability (G) coefficient was 0.72, confirming inadequate reliability for research purposes. Decision (D) study results suggest the DASH can attain a G coefficient of 0.8 with five or more raters. Coefficient alpha was 0.95 for the DASH. A total of 90 and 40% of items from Elements 1 and 4, respectively, were deemed "not applicable" or left blank.

Conclusions: Our results suggest that the DASH does not have sufficient validity and reliability to rigorously assess debriefing in the post-critical event environment but may be amenable to modification. Further development of the tool will be needed for optimal use in this context.