Artificial Intelligence-Driven Structurization of Diagnostic Information in Free-Text Pathology Reports

J Pathol Inform. 2020 Feb 11:11:4. doi: 10.4103/jpi.jpi_30_19. eCollection 2020.

Abstract

Background: Free-text sections of pathology reports contain the most important information from a diagnostic standpoint. However, this information is largely underutilized for computer-based analytics. The vast majority of NLP-based methods lack a capacity to accurately extract complex diagnostic entities and relationships among them as well as to provide an adequate knowledge representation for downstream data-mining applications.

Methods: In this paper, we introduce a novel informatics pipeline that extends open information extraction (openIE) techniques with artificial intelligence (AI) based modeling to extract and transform complex diagnostic entities and relationships among them into Knowledge Graphs (KGs) of relational triples (RTs).

Results: Evaluation studies have demonstrated that the pipeline's output significantly differs from a random process. The semantic similarity with original reports is high (Mean Weighted Overlap of 0.83). The precision and recall of extracted RTs based on experts' assessment were 0.925 and 0.841 respectively (P <0.0001). Inter-rater agreement was significant at 93.6% and inter-rated reliability was 81.8%.

Conclusion: The results demonstrated important properties of the pipeline such as high accuracy, minimality and adequate knowledge representation. Therefore, we conclude that the pipeline can be used in various downstream data-mining applications to assist diagnostic medicine.

Keywords: Free-text pathology reports; information extraction; n-ary modeling; structurization.