Automated detection of causal relationships among diseases and imaging findings in textual radiology reports

J Am Med Inform Assoc. 2023 Sep 25;30(10):1701-1706. doi: 10.1093/jamia/ocad119.

Abstract

Objective: Textual radiology reports contain a wealth of information that may help understand associations among diseases and imaging observations. This study evaluated the ability to detect causal associations among diseases and imaging findings from their co-occurrence in radiology reports.

Materials and methods: This IRB-approved and HIPAA-compliant study analyzed 1 702 462 consecutive reports of 1 396 293 patients; patient consent was waived. Reports were analyzed for positive mention of 16 839 entities (disorders and imaging findings) of the Radiology Gamuts Ontology (RGO). Entities that occurred in fewer than 25 patients were excluded. A Bayesian network structure-learning algorithm was applied at P < 0.05 threshold: edges were evaluated as possible causal relationships. RGO and/or physician consensus served as ground truth.

Results: 2742 of 16 839 RGO entities were included, 53 849 patients (3.9%) had at least one included entity. The algorithm identified 725 pairs of entities as causally related; 634 were confirmed by reference to RGO or physician review (87% precision). As shown by its positive likelihood ratio, the algorithm increased detection of causally associated entities 6876-fold.

Discussion: Causal relationships among diseases and imaging findings can be detected with high precision from textual radiology reports.

Conclusion: This approach finds causal relationships among diseases and imaging findings with high precision from textual radiology reports, despite the fact that causally related entities represent only 0.039% of all pairs of entities. Applying this approach to larger report text corpora may help detect unspecified or heretofore unrecognized associations.

Keywords: biomedical ontologies (D064229); correlation of data (D000078331); data mining (D057225); etiology (Q000209); machine learning (D000069550); natural language processing (D009323); radiology (D011871); radiology information systems (D011873).

MeSH terms

  • Bayes Theorem
  • Diagnostic Imaging
  • Humans
  • Natural Language Processing
  • Radiography
  • Radiology Information Systems*
  • Radiology*

Substances

  • graphene oxide