Deep-Learning Uncovers certain CCM Isoforms as Transcription Factors

Front Biosci (Landmark Ed). 2024 Feb 21;29(2):75. doi: 10.31083/j.fbl2902075.

Abstract

Background: Cerebral Cavernous Malformations (CCMs) are brain vascular abnormalities associated with an increased risk of hemorrhagic strokes. Familial CCMs result from autosomal dominant inheritance involving three genes: KRIT1 (CCM1), MGC4607 (CCM2), and PDCD10 (CCM3). CCM1 and CCM3 form the CCM Signal Complex (CSC) by binding to CCM2. Both CCM1 and CCM2 exhibit cellular heterogeneity through multiple alternative spliced isoforms, where exons from the same gene combine in diverse ways, leading to varied mRNA transcripts. Additionally, both demonstrate nucleocytoplasmic shuttling between the nucleus and cytoplasm, suggesting their potential role in gene expression regulation as transcription factors (TFs). Due to the accumulated data indicating the cellular localization of CSC proteins in the nucleus and their interaction with progesterone receptors, which serve dual roles as both cellular signaling components and TFs, a question has arisen regarding whether CCMs could also function in both capacities like progesterone receptors.

Methods: To investigate this potential, we employed our proprietary deep-learning (DL)-based algorithm, specifically utilizing a biased-Support Vector Machine (SVM) model, to explore the plausible cellular function of any of the CSC proteins, particularly focusing on CCM gene isoforms with nucleocytoplasmic shuttling, acting as TFs in gene expression regulation.

Results: Through a comparative DL-based predictive analysis, we have effectively discerned a collective of 11 isoforms across all CCM proteins (CCM1-3). Additionally, we have substantiated the TF functionality of 8 isoforms derived from CCM1 and CCM2 proteins, marking the inaugural identification of CCM isoforms in the role of TFs.

Conclusions: This groundbreaking discovery directly challenges the prevailing paradigm, which predominantly emphasizes the involvement of CSC solely in endothelial cellular functions amid various potential cellular signal cascades during angiogenesis.

Keywords: Biased SVM model; CCM2 isoforms; Evolutionary Scale Modeling; Large Language Model; Support Vector Machine; cerebral cavernous malformations; deep-learning; transcription factors.

MeSH terms

  • Carrier Proteins / metabolism
  • Deep Learning*
  • Hemangioma, Cavernous, Central Nervous System* / genetics
  • Humans
  • Protein Isoforms / genetics
  • Protein Isoforms / metabolism
  • Proto-Oncogene Proteins / genetics
  • Proto-Oncogene Proteins / metabolism
  • Receptors, Progesterone / metabolism
  • Transcription Factors / metabolism

Substances

  • Proto-Oncogene Proteins
  • Transcription Factors
  • Receptors, Progesterone
  • Carrier Proteins
  • Protein Isoforms