NetActivity enhances transcriptional signals by combining gene expression into robust gene set activity scores through interpretable autoencoders

Nucleic Acids Res. 2024 May 22;52(9):e44. doi: 10.1093/nar/gkae197.

Abstract

Grouping gene expression into gene set activity scores (GSAS) provides better biological insights than studying individual genes. However, existing gene set projection methods cannot return representative, robust, and interpretable GSAS. We developed NetActivity, a machine learning framework that generates GSAS based on a sparsely-connected autoencoder, where each neuron in the inner layer represents a gene set. We proposed a three-tier training that yielded representative, robust, and interpretable GSAS. NetActivity model was trained with 1518 GO biological processes terms and KEGG pathways and all GTEx samples. NetActivity generates GSAS robust to the initialization parameters and representative of the original transcriptome, and assigned higher importance to more biologically relevant genes. Moreover, NetActivity returns GSAS with a more consistent definition and higher interpretability than GSVA and hipathia, state-of-the-art gene set projection methods. Finally, NetActivity enables combining bulk RNA-seq and microarray datasets in a meta-analysis of prostate cancer progression, highlighting gene sets related to cell division, key for disease progression. When applied to metastatic prostate cancer, gene sets associated with cancer progression were also altered due to drug resistance, while a classical enrichment analysis identified gene sets irrelevant to the phenotype. NetActivity is publicly available in Bioconductor and GitHub.

MeSH terms

  • Algorithms
  • Gene Expression Profiling / methods
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Machine Learning
  • Male
  • Prostatic Neoplasms* / genetics
  • Prostatic Neoplasms* / metabolism
  • Prostatic Neoplasms* / pathology
  • RNA-Seq / methods
  • Transcriptome / genetics