Delving into gene-set multiplex networks facilitated by a k-nearest neighbor-based measure of similarity

Comput Struct Biotechnol J. 2023 Oct 11:21:4988-5002. doi: 10.1016/j.csbj.2023.09.042. eCollection 2023.

Abstract

Gene sets are functional units for living cells. Previously, limited studies investigated the complex relations among gene sets, but documents about their altering patterns across biological conditions still need to be prepared. In this study, we adopted and modified a classical k-nearest neighbor-based association function to detect inter-gene-set similarities. Based on this method, we built multiplex networks of gene sets for the first time; these networks contain layers of gene sets corresponding to different populations of cells. The context-based multiplex networks can capture meaningful biological variation and have considerable differences from knowledge-based networks of gene sets built on Jaccard similarity, as demonstrated in this study. Furthermore, at the scale of individual gene sets, the structural coefficients of gene sets (multiplex PageRank centrality, clustering coefficient, and participation coefficient) disclose the diversity of gene sets from the perspective of structural properties and make it easier to identify unique gene sets. In gene set enrichment analysis (GSEA), each gene set is treated independently, and its contextual and relational attributes are ignored. The structural coefficients of gene sets can supplement GSEA with information about the overall picture of gene sets, promoting the constructive reorganization of the enriched terms and helping researchers better prioritize and select gene sets.

Keywords: Gene set; Gene set co-expression; Gene set enrichment analysis; Multiplex PageRank centrality; Multiplex clustering coefficient; Multiplex network; k-nearest neighbor-based similarity.