How to discriminate non-small cell lung cancer (NSCLC) cases from an Italian administrative database? A retrospective, secondary data use study for evaluating a novel algorithm performance

BMJ Open. 2021 Sep 24;11(9):e048188. doi: 10.1136/bmjopen-2020-048188.

Abstract

Objectives: To evaluate an algorithm developed for identifying non-small cell lung cancer (NSCLC) candidates among patients with lung cancer with a diagnosis International Classification of Diseases: ninth revision (ICD-9) 162.x code in administrative databases. Algorithm could then be applied for identifying the NSCLC population in order to assess the appropriateness and quality of care of the NSCLC care pathway.

Design: Algorithm discrimination capacity to select both NSCLC or non-NSCLC was carried out on a sample for which electronic health record (EHR) diagnosis was available. A bivariate frequency distribution and other measures were used to evaluate algorithm's performances. Associations between possible factors potentially affecting algorithm accuracy were investigated.

Setting: Administrative databases used in a specific geographical area of Emilia-Romagna region, Italy.

Participants: Algorithm was carried out on patients aged >18 years, with a lung cancer diagnosis from January to December 2017 and resident in Emilia-Romagna region who have been hospitalised at IRST or in one of the hospitals placed in the Forlì-Cesena area and for which EHR diagnosis data were available.

Outcome measures: Overall accuracy, positive (PPV) and negative (NPV) predictive values, sensitivity and specificity, positive and negative likelihood ratios and diagnostic OR were calculated.

Results: A total of 430 patients were identified as lung cancer cases based on ICD-9 diagnosis. Focusing on the total incident cases (n=314), the algorithm had an overall accuracy of 82.8% with a sensitivity of 88.8%. The analysis confirmed a high level of PPV (90.2%), but lower specificity (53.7%) and NPV (50%). Higher length of stay seemed to be associated with a correct classification. Hospitalisation regimen and a supply of antiblastic therapy seemed to increase the level of PPV.

Conclusion: The algorithm demonstrated a strong validity for identifying NSCLC among patients with lung cancer in hospital administrative databases and can be used to investigate the quality of cancer care for this population.

Trial registration number: NCT04676321.

Keywords: health informatics; information technology; oncology; respiratory tract tumours.

MeSH terms

  • Algorithms
  • Carcinoma, Non-Small-Cell Lung* / diagnosis
  • Carcinoma, Non-Small-Cell Lung* / epidemiology
  • Databases, Factual
  • Humans
  • International Classification of Diseases
  • Italy / epidemiology
  • Lung Neoplasms* / diagnosis
  • Lung Neoplasms* / epidemiology
  • Retrospective Studies

Associated data

  • ClinicalTrials.gov/NCT04676321