Microarray Normalization Revisited for Reproducible Breast Cancer Biomarkers

Biomed Res Int. 2020 Aug 6:2020:1363827. doi: 10.1155/2020/1363827. eCollection 2020.

Abstract

Precision medicine for breast cancer relies on biomarkers to select therapies. However, the reliability of biomarkers drawn from gene expression arrays has been questioned and calls for reassessment, in particular for large datasets. We revisit widely used data-normalization procedures and evaluate differences in outcome in order to pinpoint the most reliable reprocessing methods biomarkers can be based upon. We generated a database of 3753 breast cancer patients out of 38 studies by downloading and curating patient samples from NCBI-GEO. As gene-expression biomarkers, we select the assessment of receptor status and breast cancer subtype classification. Each normalization procedure is applied separately, and biomarkers are then evaluated for each patient. Differences between normalization pipelines are quantified as percentages of patients having outcomes different for each pipeline. Some normalization procedures lead to quite consistent biomarkers, differing only in 1-2% of patients. Other normalization procedures-some of them have been used in many clinical studies-end up with distrusting discrepancies (10% and more). A good deal of doubt regarding the reliability of microarrays may root in the haphazard application of inadequate preprocessing pipelines. Several modes of batch corrections are evaluated regarding a possible improvement of receptor prediction from gene expression versus the golden standard of immunohistochemistry. Finally, we nominate those normalization methods yielding consistent and trustable results. Adequate bioinformatics data preprocessing is key and crucial for any subsequent statistics to arrive at trustable results. We conclude with a suggestion for future bioinformatics development to further increase the reliability of cancer biomarkers.

MeSH terms

  • Biomarkers, Tumor* / biosynthesis
  • Biomarkers, Tumor* / genetics
  • Breast Neoplasms* / genetics
  • Breast Neoplasms* / metabolism
  • Computational Biology*
  • Databases, Nucleic Acid*
  • Female
  • Gene Expression Profiling*
  • Gene Expression Regulation, Neoplastic*
  • Humans
  • Oligonucleotide Array Sequence Analysis*

Substances

  • Biomarkers, Tumor