A five-year (2015 to 2019) analysis of studies focused on breast cancer prediction using machine learning: A systematic review and bibliometric analysis

Zakia Salod; Yashik Singh

doi:10.4081/jphr.2020.1772

A five-year (2015 to 2019) analysis of studies focused on breast cancer prediction using machine learning: A systematic review and bibliometric analysis

J Public Health Res. 2020 Jun 26;9(1):1792. doi: 10.4081/jphr.2020.1772. eCollection 2020 Jun 4.

Authors

Zakia Salod¹, Yashik Singh¹

Affiliation

¹ Department of TeleHealth, University of KwaZulu-Natal, Durban, South Africa.

Abstract

The objective 1 of this study was to investigate trends in breast cancer (BC) prediction using machine learning (ML) publications by analysing country, first author, journal, institutional collaborations and co-occurrence of author keywords. The objective 2 was to provide a review of studies on BC prediction using ML and a blood analysis dataset (Breast Cancer Coimbra Dataset [BCCD]), and the objective 3 was to provide a brief review of studies based on BC prediction using ML and patients' fine needle aspirate cytology data (Wisconsin Breast Cancer Dataset [WBCD]). The design of this study was as follows: for objective 1: bibliometric analysis, data source PubMed (2015-2019); for objective 2: systematic review, data source: Google and Google Scholar (2018-2019); for objective 3: systematic review, data source: Google Scholar (2016-2019). The inclusion criteria for objective 1 were all publication results yielded from the searches. All English papers that had a 'PDF' option from the search results were included for objective 2. A sample of the 'PDF' English papers were included for objective 3. All 116 female patients from the BCCD, consisting of 64 positive BC patients and 52 controls were included in the study for objective 2. For the WBCD, all 699 female patients comprising of 458 with a benign BC tumour and 241 with a malignant BC tumour were included for objective 3. All 2928 publications were included for objective 1. The results showed that the United States of America (USA) produced the highest number of publications (n=803). In total, 2419 first authors contributed towards the publications. Breast Cancer Research and Treatment was the highest ranked journal. Institutional collaborations mainly occurred within the USA. The use of ML for BC screening and detection was the most researched topic. A total of 19 distinct papers were included for objectives 2 and 3. The findings from these studies were never presented to clinicians for validations. In conclusion, the use of ML for BC screening and detection is promising.

Keywords: Breast cancer; blood tests; cancer screening; fine needle aspiration; machine learning.

Grants and funding

Funding: This study was funded by ZS.