Examining Implicit Bias Differences in Pediatric Surgical Fellowship Letters of Recommendation Using Natural Language Processing

Geoffrey M Gray; Sacha A Williams; Bryce Bludevich; Iris Irby; Henry Chang; Paul D Danielson; Raquel Gonzalez; Christopher W Snyder; Luis M Ahumada; Nicole M Chandler

doi:10.1016/j.jsurg.2022.12.002

Examining Implicit Bias Differences in Pediatric Surgical Fellowship Letters of Recommendation Using Natural Language Processing

J Surg Educ. 2023 Apr;80(4):547-555. doi: 10.1016/j.jsurg.2022.12.002. Epub 2022 Dec 17.

Affiliations

¹ Center for Pediatric Data Science and Analytics Methodology, Johns Hopkins All Children's Hospital, St. Petersburg, Florida.
² Division of Pediatric Surgery, Johns Hopkins All Children's Hospital, St. Petersburg, Florida.
³ Division of Pediatric Surgery, Johns Hopkins All Children's Hospital, St. Petersburg, Florida. Electronic address: nicole.chandler@jhmi.edu.

PMID: 36529662
DOI: 10.1016/j.jsurg.2022.12.002

Abstract

Objective: We analyzed the prevalence and type of bias in letters of recommendation (LOR) for pediatric surgical fellowship applications from 2016-2021 using natural language processing (NLP) at a quaternary care academic hospital.

Design: Demographics were extracted from submitted applications. The Valence Aware Dictionary for sEntiment Reasoning (VADER) model was used to calculate polarity scores. The National Research Council dataset was used for emotion and intensity analysis. The Kruskal-Wallis H-test was used to determine statistical significance. SETTING: This study took place at a single, academic, free standing quaternary care children's hospital with an ACGME accredited pediatric surgery fellowship.

Participants: Applicants to a single pediatric surgery fellowship were selected for this study from 2016 to 2021. A total of 182 individual applicants were included and 701 letters of recommendation were analyzed.

Results: Black applicants had the highest mean polarity (most positive), while Hispanic applicants had the lowest. Overall differences between polarity distributions were not statistically significant. The intensity of emotions showed that differences in "anger" were statistically significant (p=0.03). Mean polarity was higher for applicants that successfully matched in pediatric surgery.

Discussion: This study identified differences in LORs based on racial and gender demographics submitted as part of pediatric surgical fellowship applications to a single training program. The presence of bias in letters of recommendation can lead to inequities in demographics to a given program. While difficult to detect for humans, natural language processing is able to detect bias as well as differences in polarity and emotional intensity. While the types of emotions identified in this study are highly similar among race and gender groups, the intensity of these emotions revealed differences, with "anger" being most significant.

Conclusion: From this work, it can be concluded that bias in LORs, as reflected as differences in polarity, which is likely a result of the intensity of the emotions being used and not the types of emotions being expressed. Natural language processing shows promise in identification of subtle areas of bias that may influence an individual's likelihood of successful matching.

Keywords: bias; letters of recommendation; natural language processing; pediatric surgery fellowship; valence aware dictionary for sentiment reasoning (VADER).

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bias, Implicit
Child
Fellowships and Scholarships
Humans
Internship and Residency*
Natural Language Processing
Personnel Selection
Specialties, Surgical*