Development of a Semiautomated Search Tool to Identify Grading From Pathology Reports for Tumors of the CNS and Prostate Cancers

JCO Clin Cancer Inform. 2021 Dec:5:1189-1196. doi: 10.1200/CCI.21.00049.

Abstract

Purpose: This study demonstrates the functionality of semiautomated algorithms to classify cancer-specific grading from electronic pathology reports generated from military treatment facilities. Two Perl-based algorithms are validated to classify WHO grade for tumors of the CNS and Gleason grades for prostate cancer.

Methods: Case-finding cohorts were developed using diagnostic codes and matched by unique identifiers to obtain pathology records generated in the Military Health System for active duty service members from 2013 to 2018. Perl-based algorithms were applied to classify document-based pathology reports to identify malignant CNS tumors and prostate cancer, followed by a hand-review process to determine accuracy of the algorithm classifications. Inter-rater reliability, sensitivity, specificity, positive predictive values (PPVs), and negative predictive values were computed following abstractor adjudication.

Results: The high PPV for the Perl-based algorithms to classify CNS tumors (PPV > 98%) and prostate cancer (PPV > 99%) supports this approach to classify malignancies for cancer surveillance operations, mediated by a hand-reviewed semiautomated process to increase sensitivity by capturing ungraded cancers. Early detection was pronounced where 33.6% and 50.7% of malignant records retained a CNS WHO grade of II or a Gleason score of 6, respectively. Sensitivity metrics met criteria (> 75%) for brain (79.9%, 95% CI, 73.0 to 85.7) and prostate (96.7%, 95% CI, 94.9 to 98.0) cancers.

Conclusion: Semiautomated, document-based text classification using Perl coding successfully leveraged identification of WHO and Gleason grades to classify pathology records for CNS tumors and prostate cancer. The process is recommended for data quality initiatives to support cancer reporting functions, epidemiology, and research.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Humans
  • Male
  • Neoplasm Grading
  • Prostate / pathology
  • Prostatic Neoplasms* / diagnosis
  • Prostatic Neoplasms* / epidemiology
  • Reproducibility of Results