Development and Evaluation of Perl-Based Algorithms to Classify Neoplasms From Pathology Records in Synoptic Report Format

JCO Clin Cancer Inform. 2021 Mar:5:295-303. doi: 10.1200/CCI.20.00152.

Abstract

Purpose: Synoptic reporting provides a mechanism for uniform and structured pathology diagnostics. This paper demonstrates the functionality of Perl alternation and grouping expressions to classify electronic pathology reports generated from military treatment facilities. Eight Perl-based algorithms are validated to classify malignant melanoma, Hodgkin lymphoma, non-Hodgkin lymphoma, leukemia, and malignant neoplasms of the breast, ovary, testis, and thyroid.

Methods: Case finding cohorts were developed using diagnostic codes for neoplasm groups and matched by unique identifiers to obtain pathology records. Preprocessing techniques and Perl-based algorithms were applied to classify records as malignant, in situ, suspect, or nonapplicable, followed by a hand-review process to determine the accuracy of the algorithm classifications. Interrater reliability, sensitivity, specificity, positive predictive values, and negative predictive values were computed following abstractor adjudication.

Results: The specificity of the Perl-based algorithms was consistently high, over 98%. Very few benign results were classified as malignant or in situ by the Perl-based algorithms; the leukemia algorithm classification was the only group to demonstrate a positive predictive value below 95%, at 91.9%. Three algorithm classification groups demonstrated a sensitivity of < 80%, including malignant neoplasm of the ovary (33.3%), leukemia (52.8%), and non-Hodgkin lymphoma (62.9%). The pathology records for these results included substantial linguistic variation.

Conclusion: This paper contextualizes the utility and value of an algorithm logic built around synoptic reporting to identify neoplasms from electronic pathology results. The major strength includes the application of Perl-based coding in SAS, an accessible software application, to develop highly specific algorithms across institutional variation in diagnostic documentation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Electronic Health Records*
  • Female
  • Humans
  • Male
  • Melanoma*
  • Reproducibility of Results
  • Software