Gene co-expression network based on part mutual information for gene-to-gene relationship and gene-cancer correlation analysis

BMC Bioinformatics. 2022 May 24;23(1):194. doi: 10.1186/s12859-022-04732-9.

Abstract

Background: Finding correlation patterns is an important goal of analyzing biological data. Currently available methods for correlation analysis mainly use non-direct associations, such as the Pearson correlation coefficient, and focus on the interpretation of networks at the level of modules. For biological objects such as genes, their collective function depends on pairwise gene-to-gene interactions. However, a large amount of redundant results from module level methods often necessitate further detailed analysis of gene interactions. New approaches of measuring direct associations among variables, such as the part mutual information (PMI), may help us better interpret the correlation pattern of biological data at the level of variable pairs.

Results: We use PMI to calculate gene co-expression networks of cancer mRNA transcriptome data. Our results show that the PMI-based networks with fewer edges could represent the correlation pattern and are robust across biological conditions. The PMI-based networks recall significantly more important parts of omics defined gene-pair relationships than the Pearson Correlation Coefficient (PCC)-based networks. Based on the scores derived from PMI-recalled copy number variation or DNA methylation gene-pairs, the patients with cancer can be divided into groups with significant differences on disease specific survival.

Conclusions: PMI, measuring direct associations between variables, extracts more important biological relationships at the level of gene pairs than conventional indirect association measures do. It can be used to refine module level results from other correlation methods. Particularly, PMI is beneficial to analysis of biological data of the complicated systems, for example, cancer transcriptome data.

Keywords: Cancer survival analysis; Correlation analysis; Direct association; Multiple omics integration; Part mutual information.

MeSH terms

  • Correlation of Data
  • DNA Copy Number Variations*
  • Gene Expression Profiling
  • Gene Regulatory Networks
  • Humans
  • Neoplasms* / genetics
  • Transcriptome