Natural Language Processing in Diagnostic Texts from Nephropathology

Maximilian Legnar; Philipp Daumke; Jürgen Hesser; Stefan Porubsky; Zoran Popovic; Jan Niklas Bindzus; Joern-Helge Heinrich Siemoneit; Cleo-Aron Weis

doi:10.3390/diagnostics12071726

Natural Language Processing in Diagnostic Texts from Nephropathology

Diagnostics (Basel). 2022 Jul 15;12(7):1726. doi: 10.3390/diagnostics12071726.

Authors

Maximilian Legnar^{1

2}, Philipp Daumke³, Jürgen Hesser^{2

4}, Stefan Porubsky⁵, Zoran Popovic², Jan Niklas Bindzus², Joern-Helge Heinrich Siemoneit², Cleo-Aron Weis^{2

6}

Affiliations

¹ Mannheim Institute for Intelligent Systems in Medicine (MIISM), Medical Faculty Mannheim, Heidelberg University, 68167 Mannheim, Germany.
² Institute of Pathology, Medical Faculty Mannheim, Heidelberg University, 68167 Mannheim, Germany.
³ Averbis GmbH, 79098 Freiburg, Germany.
⁴ Data Analysis and Modeling, MIISM, Medical School, Interdisciplinary Center for Scientific Computing (IWR), Central Institute for Computer Engineering (ZITI), CZS Heidelberg Center for Model-Based AI, Heidelberg University, 69117 Heidelberg, Germany.
⁵ Institute of Pathology, Medical Faculty Mainz, University Hospital Mainz, 55131 Mainz, Germany.
⁶ Institute of Pathology, Medical Faculty Heidelberg, 69120 Heidelberg, Germany.

Abstract

Introduction: This study investigates whether it is possible to predict a final diagnosis based on a written nephropathological description-as a surrogate for image analysis-using various NLP methods.

Methods: For this work, 1107 unlabelled nephropathological reports were included. (i) First, after separating each report into its microscopic description and diagnosis section, the diagnosis sections were clustered unsupervised to less than 20 diagnostic groups using different clustering techniques. (ii) Second, different text classification methods were used to predict the diagnostic group based on the microscopic description section.

Results: The best clustering results (i) could be achieved with HDBSCAN, using BoW-based feature extraction methods. Based on keywords, these clusters can be mapped to certain diagnostic groups. A transformer encoder-based approach as well as an SVM worked best regarding diagnosis prediction based on the histomorphological description (ii). Certain diagnosis groups reached F1-scores of up to 0.892 while others achieved weak classification metrics.

Conclusion: While textual morphological description alone enables retrieving the correct diagnosis for some entities, it does not work sufficiently for other entities. This is in accordance with a previous image analysis study on glomerular change patterns, where some diagnoses are associated with one pattern, but for others, there exists a complex pattern combination.

Keywords: BERT; NLP; deep learning; machine learning; nephropathology; text analysis; text classification; topic modelling; transformer encoder.

Grants and funding

ZIM-grant KK5256201LU1/German Federal Ministry for Economic Affairs and Climate Action