A Racially Unbiased, Machine Learning Approach to Prediction of Mortality: Algorithm Development Study

Angier Allen; Samson Mataraso; Anna Siefkas; Hoyt Burdick; Gregory Braden; R Phillip Dellinger; Andrea McCoy; Emily Pellegrini; Jana Hoffman; Abigail Green-Saxena; Gina Barnes; Jacob Calvert; Ritankar Das

doi:10.2196/22400

A Racially Unbiased, Machine Learning Approach to Prediction of Mortality: Algorithm Development Study

JMIR Public Health Surveill. 2020 Oct 22;6(4):e22400. doi: 10.2196/22400.

Authors

Affiliations

¹ Dascena, Inc, San Francisco, CA, United States.
² Cabell Huntington Hospital, Huntington, WV, United States.
³ Marshall University School of Medicine, Huntington, WV, United States.
⁴ Kidney Care and Transplant Associates of New England, Springfield, MA, United States.
⁵ Division of Critical Care Medicine, Cooper University Hospital/Cooper Medical School of Rowan University, Camden, NJ, United States.
⁶ Cape Regional Medical Center, Cape May Court House, NJ, United States.

PMID: 33090117
PMCID: PMC7644374
DOI: 10.2196/22400

Abstract

Background: Racial disparities in health care are well documented in the United States. As machine learning methods become more common in health care settings, it is important to ensure that these methods do not contribute to racial disparities through biased predictions or differential accuracy across racial groups.

Objective: The goal of the research was to assess a machine learning algorithm intentionally developed to minimize bias in in-hospital mortality predictions between white and nonwhite patient groups.

Methods: Bias was minimized through preprocessing of algorithm training data. We performed a retrospective analysis of electronic health record data from patients admitted to the intensive care unit (ICU) at a large academic health center between 2001 and 2012, drawing data from the Medical Information Mart for Intensive Care-III database. Patients were included if they had at least 10 hours of available measurements after ICU admission, had at least one of every measurement used for model prediction, and had recorded race/ethnicity data. Bias was assessed through the equal opportunity difference. Model performance in terms of bias and accuracy was compared with the Modified Early Warning Score (MEWS), the Simplified Acute Physiology Score II (SAPS II), and the Acute Physiologic Assessment and Chronic Health Evaluation (APACHE).

Results: The machine learning algorithm was found to be more accurate than all comparators, with a higher sensitivity, specificity, and area under the receiver operating characteristic. The machine learning algorithm was found to be unbiased (equal opportunity difference 0.016, P=.20). APACHE was also found to be unbiased (equal opportunity difference 0.019, P=.11), while SAPS II and MEWS were found to have significant bias (equal opportunity difference 0.038, P=.006 and equal opportunity difference 0.074, P<.001, respectively).

Conclusions: This study indicates there may be significant racial bias in commonly used severity scoring systems and that machine learning algorithms may reduce bias while improving on the accuracy of these methods.

Keywords: health disparities; machine learning; mortality; prediction; racial disparities.

©Angier Allen, Samson Mataraso, Anna Siefkas, Hoyt Burdick, Gregory Braden, R Phillip Dellinger, Andrea McCoy, Emily Pellegrini, Jana Hoffman, Abigail Green-Saxena, Gina Barnes, Jacob Calvert, Ritankar Das. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 22.10.2020.

MeSH terms

APACHE
Adult
Aged
Algorithms
Cohort Studies
Early Warning Score
Electronic Health Records / statistics & numerical data
Female
Forecasting / methods*
Hospital Mortality*
Humans
Machine Learning / standards*
Machine Learning / statistics & numerical data
Male
Middle Aged
Retrospective Studies
Simplified Acute Physiology Score