Computational Discrimination of Breast Cancer for Korean Women Based on Epidemiologic Data Only

J Korean Med Sci. 2015 Aug;30(8):1025-34. doi: 10.3346/jkms.2015.30.8.1025. Epub 2015 Jul 15.

Abstract

Breast cancer is the second leading cancer for Korean women and its incidence rate has been increasing annually. If early diagnosis were implemented with epidemiologic data, the women could easily assess breast cancer risk using internet. National Cancer Institute in the United States has released a Web-based Breast Cancer Risk Assessment Tool based on Gail model. However, it is inapplicable directly to Korean women since breast cancer risk is dependent on race. Also, it shows low accuracy (58%-59%). In this study, breast cancer discrimination models for Korean women are developed using only epidemiological case-control data (n = 4,574). The models are configured by different classification techniques: support vector machine, artificial neural network, and Bayesian network. A 1,000-time repeated random sub-sampling validation is performed for diverse parameter conditions, respectively. The performance is evaluated and compared as an area under the receiver operating characteristic curve (AUC). According to age group and classification techniques, AUC, accuracy, sensitivity, specificity, and calculation time of all models were calculated and compared. Although the support vector machine took the longest calculation time, the highest classification performance has been achieved in the case of women older than 50 yr (AUC = 64%). The proposed model is dependent on demographic characteristics, reproductive factors, and lifestyle habits without using any clinical or genetic test. It is expected that the model could be implemented as a web-based discrimination tool for breast cancer. This tool can encourage potential breast cancer prone women to go the hospital for diagnostic tests.

Keywords: Breast Neoplasms; Computers; Neural Networks; Support Vector Machines.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Breast Neoplasms / diagnosis*
  • Breast Neoplasms / epidemiology*
  • Diagnosis, Computer-Assisted / methods*
  • Early Detection of Cancer / methods*
  • Female
  • Humans
  • Machine Learning*
  • Middle Aged
  • Pattern Recognition, Automated / methods
  • Prevalence
  • Reproducibility of Results
  • Republic of Korea / epidemiology
  • Risk Assessment / methods
  • Risk Factors
  • Sensitivity and Specificity
  • Women's Health / statistics & numerical data*