Unsupervised machine learning for identifying important visual features through bag-of-words using histopathology data from chronic kidney disease

Joonsang Lee; Elisa Warner; Salma Shaikhouni; Markus Bitzer; Matthias Kretzler; Debbie Gipson; Subramaniam Pennathur; Keith Bellovich; Zeenat Bhat; Crystal Gadegbeku; Susan Massengill; Kalyani Perumal; Jharna Saha; Yingbao Yang; Jinghui Luo; Xin Zhang; Laura Mariani; Jeffrey B Hodgin; Arvind Rao; C-PROBE Study

doi:10.1038/s41598-022-08974-8

Unsupervised machine learning for identifying important visual features through bag-of-words using histopathology data from chronic kidney disease

Sci Rep. 2022 Mar 22;12(1):4832. doi: 10.1038/s41598-022-08974-8.

Authors

Joonsang Lee¹, Elisa Warner¹, Salma Shaikhouni², Markus Bitzer², Matthias Kretzler², Debbie Gipson³, Subramaniam Pennathur², Keith Bellovich⁴, Zeenat Bhat⁵, Crystal Gadegbeku⁶, Susan Massengill⁷, Kalyani Perumal⁸, Jharna Saha⁹, Yingbao Yang⁹, Jinghui Luo⁹, Xin Zhang¹, Laura Mariani², Jeffrey B Hodgin^#¹⁰, Arvind Rao^#^{11

12

13

14}; C-PROBE Study

Affiliations

¹ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
² Department of Internal Medicine, Nephrology, University of Michigan, Ann Arbor, MI, USA.
³ Department of Pediatrics, Pediatric Nephrology, University of Michigan, Ann Arbor, MI, USA.
⁴ Department of Internal Medicine, Nephrology, St. Clair Nephrology Research, Detroit, MI, USA.
⁵ Department of Internal Medicine, Nephrology, Wayne State University, Detroit, MI, USA.
⁶ Department of Internal Medicine, Nephrology, Cleveland Clinic, Cleveland, OH, USA.
⁷ Department of Pediatrics, Pediatric Nephrology, Levine Children's Hospital, Charlotte, NC, USA.
⁸ Department of Internal Medicine, Nephrology, Department of JH Stroger Hospital, Chicago, IL, USA.
⁹ Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
¹⁰ Department of Pathology, University of Michigan, Ann Arbor, MI, USA. jhodgin@med.umich.edu.
¹¹ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA. ukarvind@med.umich.edu.
¹² Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA. ukarvind@med.umich.edu.
¹³ Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA. ukarvind@med.umich.edu.
¹⁴ Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI, USA. ukarvind@med.umich.edu.

^# Contributed equally.

Abstract

Pathologists use visual classification to assess patient kidney biopsy samples when diagnosing the underlying cause of kidney disease. However, the assessment is qualitative, or semi-quantitative at best, and reproducibility is challenging. To discover previously unknown features which predict patient outcomes and overcome substantial interobserver variability, we developed an unsupervised bag-of-words model. Our study applied to the C-PROBE cohort of patients with chronic kidney disease (CKD). 107,471 histopathology images were obtained from 161 biopsy cores and identified important morphological features in biopsy tissue that are highly predictive of the presence of CKD both at the time of biopsy and in one year. To evaluate the performance of our model, we estimated the AUC and its 95% confidence interval. We show that this method is reliable and reproducible and can achieve 0.93 AUC at predicting glomerular filtration rate at the time of biopsy as well as predicting a loss of function at one year. Additionally, with this method, we ranked the identified morphological features according to their importance as diagnostic markers for chronic kidney disease. In this study, we have demonstrated the feasibility of using an unsupervised machine learning method without human input in order to predict the level of kidney function in CKD. The results from our study indicate that the visual dictionary, or visual image pattern, obtained from unsupervised machine learning can predict outcomes using machine-derived values that correspond to both known and unknown clinically relevant features.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Biopsy
Female
Glomerular Filtration Rate
Humans
Male
Renal Insufficiency, Chronic* / diagnosis
Reproducibility of Results
Unsupervised Machine Learning*

Abstract

Publication types

MeSH terms

Grants and funding