Twiner: correlation-based regularization for identifying common cancer gene signatures

BMC Bioinformatics. 2019 Jun 25;20(1):356. doi: 10.1186/s12859-019-2937-8.

Abstract

Background: Breast and prostate cancers are typical examples of hormone-dependent cancers, showing remarkable similarities at the hormone-related signaling pathways level, and exhibiting a high tropism to bone. While the identification of genes playing a specific role in each cancer type brings invaluable insights for gene therapy research by targeting disease-specific cell functions not accounted so far, identifying a common gene signature to breast and prostate cancers could unravel new targets to tackle shared hormone-dependent disease features, like bone relapse. This would potentially allow the development of new targeted therapies directed to genes regulating both cancer types, with a consequent positive impact in cancer management and health economics.

Results: We address the challenge of extracting gene signatures from transcriptomic data of prostate adenocarcinoma (PRAD) and breast invasive carcinoma (BRCA) samples, particularly estrogen positive (ER+), and androgen positive (AR+) triple-negative breast cancer (TNBC), using sparse logistic regression. The introduction of gene network information based on the distances between BRCA and PRAD correlation matrices is investigated, through the proposed twin networks recovery (twiner) penalty, as a strategy to ensure similarly correlated gene features in two diseases to be less penalized during the feature selection procedure.

Conclusions: Our analysis led to the identification of genes that show a similar correlation pattern in BRCA and PRAD transcriptomic data, and are selected as key players in the classification of breast and prostate samples into ER+ BRCA/AR+ TNBC/PRAD tumor and normal tissues, and also associated with survival time distributions. The results obtained are supported by the literature and are expected to unveil the similarities between the diseases, disclose common disease biomarkers, and help in the definition of new strategies for more effective therapies.

Keywords: Breast invasive carcinoma; Gene network; Prostate adenocarcinoma; Sparse logistic regression; Triple-negative breast cancer.

MeSH terms

  • Estrogens / metabolism
  • Female
  • Gene Expression Profiling / methods*
  • Gene Regulatory Networks
  • Humans
  • Logistic Models
  • Male
  • Principal Component Analysis
  • Prostatic Neoplasms / genetics*
  • Prostatic Neoplasms / mortality
  • Prostatic Neoplasms / pathology
  • Receptors, Androgen / metabolism
  • Survival Analysis
  • Transcriptome*
  • Triple Negative Breast Neoplasms / genetics*
  • Triple Negative Breast Neoplasms / mortality
  • Triple Negative Breast Neoplasms / pathology

Substances

  • Estrogens
  • Receptors, Androgen