Purpose: To examine the inter-rater reliability of expert and non-expert observers when they used objective structured checklists to evaluate candidates' performances on three simulated medical procedures.
Method: Simulations and structured checklists were developed for three medical procedures: endotracheal intubation, application of a forearm cast, and suturing a simple skin laceration. Groups comprised of two expert and two non-expert observers scored the performances of 101 procedures by 38 medical trainees and practitioners of varying skill levels. Inter-rater reliability was assessed using Pearson correlation coefficients.
Results: Inter-rater reliability was good for expert/expert, expert/non-expert, and non-expert/non-expert pairings in all three skills simulations.
Conclusion: Both expert and non-expert observers demonstrated good inter-rater reliability when using structured checklists to assess procedural skills. Further study is required to determine whether this conclusion may be extrapolated to other study groups or procedures.