Integrating multi-type aberrations from DNA and RNA through dynamic mapping gene space for subtype-specific breast cancer driver discovery

PeerJ. 2023 Feb 3:11:e14843. doi: 10.7717/peerj.14843. eCollection 2023.

Abstract

Driver event discovery is a crucial demand for breast cancer diagnosis and therapy. In particular, discovering subtype-specificity of drivers can prompt the personalized biomarker discovery and precision treatment of cancer patients. Still, most of the existing computational driver discovery studies mainly exploit the information from DNA aberrations and gene interactions. Notably, cancer driver events would occur due to not only DNA aberrations but also RNA alternations, but integrating multi-type aberrations from both DNA and RNA is still a challenging task for breast cancer drivers. On the one hand, the data formats of different aberration types also differ from each other, known as data format incompatibility. On the other hand, different types of aberrations demonstrate distinct patterns across samples, known as aberration type heterogeneity. To promote the integrated analysis of subtype-specific breast cancer drivers, we design a "splicing-and-fusing" framework to address the issues of data format incompatibility and aberration type heterogeneity simultaneously. To overcome the data format incompatibility, the "splicing-step" employs a knowledge graph structure to connect multi-type aberrations from the DNA and RNA data into a unified formation. To tackle the aberration type heterogeneity, the "fusing-step" adopts a dynamic mapping gene space integration approach to represent the multi-type information by vectorized profiles. The experiments also demonstrate the advantages of our approach in both the integration of multi-type aberrations from DNA and RNA and the discovery of subtype-specific breast cancer drivers. In summary, our "splicing-and-fusing" framework with knowledge graph connection and dynamic mapping gene space fusion of multi-type aberrations data from DNA and RNA can successfully discover potential breast cancer drivers with subtype-specificity indication.

Keywords: Data integration; Driver gene; Knowledge graph; Subtype-specificity.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast Neoplasms* / genetics
  • DNA
  • Female
  • Humans
  • RNA / genetics

Substances

  • RNA
  • DNA

Grants and funding

This work is supported by the National Natural Science Foundation of China (Grant Nos. 61901322, 62202117, and 61974109). This work is also supported by the Special Foundation in Department of Higher Education of Guangdong (Grant No. 2022ZDX2053). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.