Use of classifiers to optimise the identification and characterisation of metastatic breast cancer in a nationwide administrative registry

Antonis Valachis; Peter Carlqvist; Máté Szilcz; Jonatan Freilich; Simona Vertuani; Barbro Holm; Henrik Lindman

doi:10.1080/0284186X.2021.1979645

Use of classifiers to optimise the identification and characterisation of metastatic breast cancer in a nationwide administrative registry

Acta Oncol. 2021 Dec;60(12):1604-1610. doi: 10.1080/0284186X.2021.1979645. Epub 2021 Sep 22.

Authors

Antonis Valachis¹, Peter Carlqvist², Máté Szilcz^{3

4}, Jonatan Freilich³, Simona Vertuani⁵, Barbro Holm⁵, Henrik Lindman⁶

Affiliations

¹ Department of Oncology, Faculty of Medicine and Health, Örebro University Hospital, Örebro, Sweden.
² Nordic Market Access AB, Stockholm, Sweden.
³ Parexel International, Stockholm, Sweden.
⁴ Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
⁵ Novartis Sverige AB, Kista, Sweden.
⁶ Department of Immunology, Genetics and Pathology, Experimental and Clinical Oncology; Clinical Oncology, Faculty of Medicine, Uppsala University Hospital, Uppsala, Sweden.

PMID: 34549678
DOI: 10.1080/0284186X.2021.1979645

Abstract

Bakground: The prognosis for patients with metastatic breast cancer (MBC) is substantially worse when compared with patients with earlier stage disease. Therefore, understanding the differences in epidemiology between these two patient groups is important. Studies using population-based cancer registries to identify MBC are hampered by the quality of reporting. Patients are registered once (at time of initial diagnosis); hence only data for patients with de novo MBC are identifiable, whereas data for patients with recurrent MBC are not. This makes accurate estimation of the epidemiology and healthcare utilisation of MBC challenging. This study aimed to investigate whether machine-learning could improve identification of MBC in national health registries.

Material and methods: Data for patients with confirmed MBC from a regional breast cancer registry were used to train machine-learning algorithms (or 'classifiers'). The best performing classifier (accuracy 97.3%, positive predictive value 85.1%) was applied to Swedish national registries for 2008 to 2016.

Results: Mean yearly MBC incidence was estimated at 14 per 100,000 person-years (with 18% diagnosed de novo and 76% of the total with HR-positive MBC).

Conclusion: To our knowledge, this is the first study to use machine learning to identify MBC regardless of stage at diagnosis in health registries covering the entire population of Sweden.

Keywords: Breast cancer; European cohort; classifier; epidemiology; health registries; metastatic; retrospective study.

MeSH terms

Breast
Breast Neoplasms* / epidemiology
Female
Humans
Neoplasm Recurrence, Local
Prognosis
Registries