Use of classifiers to optimise the identification and characterisation of metastatic breast cancer in a nationwide administrative registry

Acta Oncol. 2021 Dec;60(12):1604-1610. doi: 10.1080/0284186X.2021.1979645. Epub 2021 Sep 22.

Abstract

Bakground: The prognosis for patients with metastatic breast cancer (MBC) is substantially worse when compared with patients with earlier stage disease. Therefore, understanding the differences in epidemiology between these two patient groups is important. Studies using population-based cancer registries to identify MBC are hampered by the quality of reporting. Patients are registered once (at time of initial diagnosis); hence only data for patients with de novo MBC are identifiable, whereas data for patients with recurrent MBC are not. This makes accurate estimation of the epidemiology and healthcare utilisation of MBC challenging. This study aimed to investigate whether machine-learning could improve identification of MBC in national health registries.

Material and methods: Data for patients with confirmed MBC from a regional breast cancer registry were used to train machine-learning algorithms (or 'classifiers'). The best performing classifier (accuracy 97.3%, positive predictive value 85.1%) was applied to Swedish national registries for 2008 to 2016.

Results: Mean yearly MBC incidence was estimated at 14 per 100,000 person-years (with 18% diagnosed de novo and 76% of the total with HR-positive MBC).

Conclusion: To our knowledge, this is the first study to use machine learning to identify MBC regardless of stage at diagnosis in health registries covering the entire population of Sweden.

Keywords: Breast cancer; European cohort; classifier; epidemiology; health registries; metastatic; retrospective study.

MeSH terms

  • Breast
  • Breast Neoplasms* / epidemiology
  • Female
  • Humans
  • Neoplasm Recurrence, Local
  • Prognosis
  • Registries