Machine learning using longitudinal prescription and medical claims for the detection of non-alcoholic steatohepatitis (NASH)

BMJ Health Care Inform. 2022 Mar;29(1):e100510. doi: 10.1136/bmjhci-2021-100510.

Abstract

Objectives: To develop and evaluate machine learning models to detect patients with suspected undiagnosed non-alcoholic steatohepatitis (NASH) for diagnostic screening and clinical management.

Methods: In this retrospective observational non-interventional study using administrative medical claims data from 1 463 089 patients, gradient-boosted decision trees were trained to detect patients with likely NASH from an at-risk patient population with a history of obesity, type 2 diabetes mellitus, metabolic disorder or non-alcoholic fatty liver (NAFL). Models were trained to detect likely NASH in all at-risk patients or in the subset without a prior NAFL diagnosis (at-risk non-NAFL patients). Models were trained and validated using retrospective medical claims data and assessed using area under precision recall curves and receiver operating characteristic curves (AUPRCs and AUROCs).

Results: The 6-month incidences of NASH in claims data were 1 per 1437 at-risk patients and 1 per 2127 at-risk non-NAFL patients . The model trained to detect NASH in all at-risk patients had an AUPRC of 0.0107 (95% CI 0.0104 to 0.0110) and an AUROC of 0.84. At 10% recall, model precision was 4.3%, which is 60× above NASH incidence. The model trained to detect NASH in the non-NAFL cohort had an AUPRC of 0.0030 (95% CI 0.0029 to 0.0031) and an AUROC of 0.78. At 10% recall, model precision was 1%, which is 20× above NASH incidence.

Conclusion: The low incidence of NASH in medical claims data corroborates the pattern of NASH underdiagnosis in clinical practice. Claims-based machine learning could facilitate the detection of patients with probable NASH for diagnostic testing and disease management.

Keywords: BMJ Health Informatics; artificial intelligence; data science; machine learning; medical records.

Publication types

  • Observational Study

MeSH terms

  • Diabetes Mellitus, Type 2* / diagnosis
  • Diabetes Mellitus, Type 2* / epidemiology
  • Humans
  • Machine Learning
  • Non-alcoholic Fatty Liver Disease* / diagnosis
  • Non-alcoholic Fatty Liver Disease* / epidemiology
  • Non-alcoholic Fatty Liver Disease* / etiology
  • Prescriptions
  • Retrospective Studies