A study of the suitability of autoencoders for preprocessing data in breast cancer experimentation

J Biomed Inform. 2017 Aug:72:33-44. doi: 10.1016/j.jbi.2017.06.020. Epub 2017 Jun 27.

Abstract

Breast cancer is the most common cause of cancer death in women. Today, post-transcriptional protein products of the genes involved in breast cancer can be identified by immunohistochemistry. However, this method has problems arising from the intra-observer and inter-observer variability in the assessment of pathologic variables, which may result in misleading conclusions. Using an optimal selection of preprocessing techniques may help to reduce observer variability. Deep learning has emerged as a powerful technique for any tasks related to machine learning such as classification and regression. The aim of this work is to use autoencoders (neural networks commonly used to feed deep learning architectures) to improve the quality of the data for developing immunohistochemistry signatures with prognostic value in breast cancer. Our testing on data from 222 patients with invasive non-special type breast carcinoma shows that an automatic binarization of experimental data after autoencoding could outperform other classical preprocessing techniques (such as human-dependent or automatic binarization only) when applied to the prognosis of breast cancer by immunohistochemical signatures.

Keywords: Autoencoder; Biomedical data; Breast cancer; Deep learning; Preprocessing.

MeSH terms

  • Breast Neoplasms / diagnosis*
  • Female
  • Humans
  • Machine Learning*
  • Neural Networks, Computer*
  • Observer Variation
  • Prognosis