A natural language processing algorithm accurately classifies steatotic liver disease pathology to estimate the risk of cirrhosis

Hepatol Commun. 2024 Mar 29;8(4):e0403. doi: 10.1097/HC9.0000000000000403. eCollection 2024 Apr 1.

Abstract

Background: Histopathology remains the gold standard for diagnosing and staging metabolic dysfunction-associated steatotic liver disease (MASLD). The feasibility of studying MASLD progression in electronic medical records based on histological features is limited by the free-text nature of pathology reports. Here we introduce a natural language processing (NLP) algorithm to automatically score MASLD histology features.

Methods: From the Mass General Brigham health care system electronic medical record, we identified all patients (1987-2021) with steatosis on index liver biopsy after excluding excess alcohol use and other etiologies of liver disease. An NLP algorithm was constructed in Python to detect steatosis, lobular inflammation, ballooning, and fibrosis stage from pathology free-text and manually validated in >1200 pathology reports. Patients were followed from the index biopsy to incident decompensated liver disease accounting for covariates.

Results: The NLP algorithm demonstrated positive and negative predictive values from 93.5% to 100% for all histologic concepts. Among 3134 patients with biopsy-confirmed MASLD followed for 20,604 person-years, rates of the composite endpoint increased monotonically with worsening index fibrosis stage (p for linear trend <0.005). Compared to simple steatosis (incidence rate, 15.06/1000 person-years), the multivariable-adjusted HRs for cirrhosis were 1.04 (0.72-1.5) for metabolic dysfunction-associated steatohepatitis (MASH)/F0, 1.19 (0.92-1.54) for MASH/F1, 1.89 (1.41-2.52) for MASH/F2, and 4.21 (3.26-5.43) for MASH/F3.

Conclusions: The NLP algorithm accurately scores histological features of MASLD from pathology free-text. This algorithm enabled the construction of a large and high-quality MASLD cohort across a multihospital health care system and disclosed an accelerating risk for cirrhosis based on the index MASLD fibrosis stage.

MeSH terms

  • Algorithms
  • Biopsy
  • Fatty Liver* / diagnosis
  • Fatty Liver* / epidemiology
  • Humans
  • Liver Cirrhosis / diagnosis
  • Natural Language Processing*