Prediction of breast cancer risk using a machine learning approach embedded with a locality preserving projection algorithm

Phys Med Biol. 2018 Jan 30;63(3):035020. doi: 10.1088/1361-6560/aaa1ca.

Abstract

In order to automatically identify a set of effective mammographic image features and build an optimal breast cancer risk stratification model, this study aims to investigate advantages of applying a machine learning approach embedded with a locally preserving projection (LPP) based feature combination and regeneration algorithm to predict short-term breast cancer risk. A dataset involving negative mammograms acquired from 500 women was assembled. This dataset was divided into two age-matched classes of 250 high risk cases in which cancer was detected in the next subsequent mammography screening and 250 low risk cases, which remained negative. First, a computer-aided image processing scheme was applied to segment fibro-glandular tissue depicted on mammograms and initially compute 44 features related to the bilateral asymmetry of mammographic tissue density distribution between left and right breasts. Next, a multi-feature fusion based machine learning classifier was built to predict the risk of cancer detection in the next mammography screening. A leave-one-case-out (LOCO) cross-validation method was applied to train and test the machine learning classifier embedded with a LLP algorithm, which generated a new operational vector with 4 features using a maximal variance approach in each LOCO process. Results showed a 9.7% increase in risk prediction accuracy when using this LPP-embedded machine learning approach. An increased trend of adjusted odds ratios was also detected in which odds ratios increased from 1.0 to 11.2. This study demonstrated that applying the LPP algorithm effectively reduced feature dimensionality, and yielded higher and potentially more robust performance in predicting short-term breast cancer risk.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Algorithms*
  • Breast / diagnostic imaging
  • Breast / pathology*
  • Breast Density
  • Breast Neoplasms / diagnosis*
  • Breast Neoplasms / diagnostic imaging
  • Female
  • Humans
  • Image Processing, Computer-Assisted / methods
  • Machine Learning*
  • Mammography / methods*
  • Middle Aged
  • Radiographic Image Interpretation, Computer-Assisted / methods*
  • Risk Assessment / methods*
  • Young Adult