Development of a Machine Learning-Based Prognostic Model for Hormone Receptor-Positive Breast Cancer Using Nine-Gene Expression Signature

World J Oncol. 2023 Oct;14(5):406-422. doi: 10.14740/wjon1700. Epub 2023 Sep 20.

Abstract

Background: Determining the prognosis of hormone receptor positive (HR+) breast cancer (BC), which accounts for 80% of all BCs, is critical in improving survival outcomes. Stratifying individuals at high risk of BC-related mortality and improving prognosis has been the focus of research for over a decade. However, these tools are not universal as they are limited to clinical factors. We hypothesized that a new framework for predicting prognosis in HR+ BC patients can develop using artificial intelligence.

Methods: A total of 2,338 HR+ human epidermal growth factor receptor 2 negative (HER2-) BC cases were analyzed from Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), The Cancer Genome Atlas (TCGA), and Gene Expression Omnibus (GEO) cohorts. Groups were then divided into high- and low-risk categories utilizing a recurrence prediction model (RPM). An RPM was created by extracting nine prognosis-related genes from over 18,000 genes using a logistic progression model.

Results: Risk classification by RPM was significantly stratified in both the discovery cohort and validation cohort. In the time-dependent area under the curve analysis, there was some variation depending on the cohort, but accuracy was found to decline significantly after about 10 years. Cell cycle related gene sets, MYC, and PI3K-AKT-mTOR signaling were enriched in high-risk tumors by the Gene Set Enrichment Analysis. High-risk tumors were associated with high levels of immune cells from the lymphoid and myeloid lineage and immune cytolytic activity, as well as low levels of stem cells and stromal cells. High-risk tumors were also associated with poor therapeutic effects of chemotherapy and endocrine therapy.

Conclusions: This model was able to stratify prognosis in multiple cohorts. This is because the model reflects major BC therapeutic target pathways and tumor immune microenvironment and, further is supported by the therapeutic effect of chemotherapy and endocrine therapy.

Keywords: Breast cancer; Cancer genomics; Machine learning; Recurrence prediction; Tumor immune microenvironment.