Avoiding bias when inferring race using name-based approaches

Diego Kozlowski; Dakota S Murray; Alexis Bell; Will Hulsey; Vincent Larivière; Thema Monroe-White; Cassidy R Sugimoto

doi:10.1371/journal.pone.0264270

Avoiding bias when inferring race using name-based approaches

PLoS One. 2022 Mar 1;17(3):e0264270. doi: 10.1371/journal.pone.0264270. eCollection 2022.

Authors

Diego Kozlowski¹, Dakota S Murray², Alexis Bell³, Will Hulsey³, Vincent Larivière⁴, Thema Monroe-White³, Cassidy R Sugimoto⁵

Affiliations

¹ DRIVEN DTU, Faculté des Sciences, de la Technologie et de la Médecine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
² School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana, United States of America.
³ Campbell School of Business, Berry College, Mt Berry, Georgia, United States of America.
⁴ École de bibliothéconomie et des sciences de l'information, Université de Montréal, Montréal, Québec, Canada.
⁵ School of Public Policy, Georgia Institute of Technology, Atlanta, Georgia, United States of America.

Abstract

Racial disparity in academia is a widely acknowledged problem. The quantitative understanding of racial-based systemic inequalities is an important step towards a more equitable research system. However, because of the lack of robust information on authors' race, few large-scale analyses have been performed on this topic. Algorithmic approaches offer one solution, using known information about authors, such as their names, to infer their perceived race. As with any other algorithm, the process of racial inference can generate biases if it is not carefully considered. The goal of this article is to assess the extent to which algorithmic bias is introduced using different approaches for name-based racial inference. We use information from the U.S. Census and mortgage applications to infer the race of U.S. affiliated authors in the Web of Science. We estimate the effects of using given and family names, thresholds or continuous distributions, and imputation. Our results demonstrate that the validity of name-based inference varies by race/ethnicity and that threshold approaches underestimate Black authors and overestimate White authors. We conclude with recommendations to avoid potential biases. This article lays the foundation for more systematic and less-biased investigations into racial disparities in science.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bias
Censuses
Ethnicity*
Humans
Names*
United States

Grants and funding

VL acknowledges funding from the Canada Research Chairs program, https://www.chairs-chaires.gc.ca/, (grant # 950-231768), DK acknowledges funding from the Luxembourg National Research Fund, https://www.fnr.lu/, under the PRIDE program (PRIDE17/12252781). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.