Extended endocrine therapy in breast cancer: A basket of length-constraint feature selection metaheuristics to balance Type I against Type II errors

J Biomed Inform. 2022 Jul:131:104112. doi: 10.1016/j.jbi.2022.104112. Epub 2022 Jun 6.

Abstract

Extended endocrine therapy beyond 5 years is of major concern to ER + breast cancer survivors. However, it might be unsuitable to apply routinely used genomic tests designed for early recurrence risks to distant recurrence within 10 years in extended treatment context. These tests initially aim at high sensitivities with Type I errors much higher than Type II. Having lower positive predictive values (PPVs), these tests can bring many false positives who might not need further treatment options to avoid adversely affecting quality of life. Alternatively, we proposed a top-down approach to the raised issues. We built 149 targeted genes from four genomic tests upon 381 ER-positive node-negative patients with either metastasis free beyond 10 years (n = 202) or metastasis within 10 years (n = 179). By a basket of SVM-wrapped length-constraint feature selection (LCFS), we discovered four genomic SVMs that traded off Type I against Type II errors. Two independent cohorts were used to validate disease outcome predictions. A 36-gene SVM balanced sensitivities with PPVs at good levels: 74% vs 76% on 10-fold cross validation (n = 347) and 75% vs 71% on a test set (n = 34). Neither Oncotype DX RS (cutoff = 18, 31, 60.97) nor PAM50 ROR-S (cutoff = 29, 53, 61.18) could. Independent cohorts showed the 36-gene SVM predicted disease free survival (n = 136, HR = 2.59; 95% CI, 1.4-4.8) and disease specific survival (n = 127, HR = 4.06; 95% CI, 1.63-10.11) better than RS (DFS, HR = 2.15; DSS, HR = 3.86) and ROR-S (DFS, HR = 2.29; DSS, HR = 2.76). The case study demonstrated how we identified a genomic test to balance Type I against Type II errors for risk stratification. The top-down approach centered around the LCFS-metaheuristics basket is a generic methodology for clinical decision-making and quality of life using targeted profiling data where the number of dimensions (p) is smaller than the number of samples (n).

Keywords: Feature selection; Metaheuristics; Quality of life; Targeted gene panel; Translational bioinformatics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast Neoplasms* / drug therapy
  • Breast Neoplasms* / genetics
  • Breast Neoplasms* / pathology
  • Female
  • Humans
  • Neoplasm Recurrence, Local / genetics
  • Neoplasm Recurrence, Local / pathology
  • Predictive Value of Tests
  • Prognosis
  • Quality of Life