Hawks and Doves: Perceptions and Reality of Faculty Evaluations

Jillian Zavodnick; Jonathan Doroshow; Sarah Rosenberg; Joshua Banks; Benjamin E Leiby; Nina Mingioni

doi:10.1177/23821205231197079

Hawks and Doves: Perceptions and Reality of Faculty Evaluations

J Med Educ Curric Dev. 2023 Sep 8:10:23821205231197079. doi: 10.1177/23821205231197079. eCollection 2023 Jan-Dec.

Authors

Jillian Zavodnick¹, Jonathan Doroshow², Sarah Rosenberg¹, Joshua Banks³, Benjamin E Leiby³, Nina Mingioni¹

Affiliations

¹ Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, USA.
² Department of Medicine, Lankenau Medical Center, Wynnewood, USA.
³ Department of Pharmacology and Experimental Therapeutics, Division of Biostatistics, Thomas Jefferson University, Philadelphia, USA.

Abstract

Objectives: Internal medicine clerkship grades are important for residency selection, but inconsistencies between evaluator ratings threaten their ability to accurately represent student performance and perceived fairness. Clerkship grading committees are recommended as best practice, but the mechanisms by which they promote accuracy and fairness are not certain. The ability of a committee to reliably assess and account for grading stringency of individual evaluators has not been previously studied.

Methods: This is a retrospective analysis of evaluations completed by faculty considered to be stringent, lenient, or neutral graders by members of a grading committee of a single medical college. Faculty evaluations were assessed for differences in ratings on individual skills and recommendations for final grade between perceived stringency categories. Logistic regression was used to determine if actual assigned ratings varied based on perceived faculty's grading stringency category.

Results: "Easy graders" consistently had the highest probability of awarding an above-average rating, and "hard graders" consistently had the lowest probability of awarding an above-average rating, though this finding only reached statistical significance only for 2 of 8 questions on the evaluation form (P = .033 and P = .001). Odds ratios of assigning a higher final suggested grade followed the expected pattern (higher for "easy" and "neutral" compared to "hard," higher for "easy" compared to "neutral") but did not reach statistical significance.

Conclusions: Perceived differences in faculty grading stringency have basis in reality for clerkship evaluation elements. However, final grades recommended by faculty perceived as "stringent" or "lenient" did not differ. Perceptions of "hawks" and "doves" are not just lore but may not have implications for students' final grades. Continued research to describe the "hawk and dove effect" will be crucial to enable assessment of local grading variation and empower local educational leadership to correct, but not overcorrect, for this effect to maintain fairness in student evaluations.

Keywords: Assessment; clerkship; evaluation; faculty; grade; grading committee; internal medicine.