Investigation of bias in an epilepsy machine learning algorithm trained on physician notes

Benjamin D Wissel; Hansel M Greiner; Tracy A Glauser; Francesco T Mangano; Daniel Santel; John P Pestian; Rhonda D Szczesniak; Judith W Dexheimer

doi:10.1111/epi.16320

Investigation of bias in an epilepsy machine learning algorithm trained on physician notes

Epilepsia. 2019 Sep;60(9):e93-e98. doi: 10.1111/epi.16320. Epub 2019 Aug 23.

Authors

Benjamin D Wissel¹, Hansel M Greiner^{2

3}, Tracy A Glauser^{2

3}, Francesco T Mangano^{2

3

4}, Daniel Santel¹, John P Pestian^{1

2}, Rhonda D Szczesniak^{2

5}, Judith W Dexheimer^{1

2

6}

Affiliations

¹ Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.
² Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio.
³ Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.
⁴ Division of Neurosurgery, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.
⁵ Division of Biostatistics & Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.
⁶ Division of Emergency Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.

Abstract

Racial disparities in the utilization of epilepsy surgery are well documented, but it is unknown whether a natural language processing (NLP) algorithm trained on physician notes would produce biased recommendations for epilepsy presurgical evaluations. To assess this, an NLP algorithm was trained to identify potential surgical candidates using 1097 notes from 175 epilepsy patients with a history of resective epilepsy surgery and 268 patients who achieved seizure freedom without surgery (total N = 443 patients). The model was tested on 8340 notes from 3776 patients with epilepsy whose surgical candidacy status was unknown (2029 male, 1747 female, median age = 9 years; age range = 0-60 years). Multiple linear regression using demographic variables as covariates was used to test for correlations between patient race and surgical candidacy scores. After accounting for other demographic and socioeconomic variables, patient race, gender, and primary language did not influence surgical candidacy scores (P > .35 for all). Higher scores were given to patients >18 years old who traveled farther to receive care, and those who had a higher family income and public insurance (P < .001, .001, .001, and .01, respectively). Demographic effects on surgical candidacy scores appeared to reflect patterns in patient referrals.

Keywords: clinical decision support; epilepsy surgery; machine learning; natural language processing.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Adolescent
Adult
Age Factors
Algorithms
Child
Child, Preschool
Electroencephalography
Epilepsy / surgery*
Healthcare Disparities*
Humans
Infant
Machine Learning*
Middle Aged
Patient Selection*
Prejudice*
Referral and Consultation
Young Adult

Abstract

Publication types

MeSH terms

Grants and funding