Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks

Sci Rep. 2018 Jul 18;8(1):10885. doi: 10.1038/s41598-018-29077-3.

Abstract

Co-expression networks are essential tools to infer biological associations between gene products and predict gene annotation. Global networks can be analyzed at the transcriptome-wide scale or after querying them with a set of guide genes to capture the transcriptional landscape of a given pathway in a process named Pathway Level Coexpression (PLC). A critical step in network construction remains the definition of gene co-expression. In the present work, we compared how Pearson Correlation Coefficient (PCC), Spearman Correlation Coefficient (SCC), their respective ranked values (Highest Reciprocal Rank (HRR)), Mutual Information (MI) and Partial Correlations (PC) performed on global networks and PLCs. This evaluation was conducted on the model plant Arabidopsis thaliana using microarray and differently pre-processed RNA-seq datasets. We particularly evaluated how dataset × distance measurement combinations performed in 5 PLCs corresponding to 4 well described plant metabolic pathways (phenylpropanoid, carbohydrate, fatty acid and terpene metabolisms) and the cytokinin signaling pathway. Our present work highlights how PCC ranked with HRR is better suited for global network construction and PLC with microarray and RNA-seq data than other distance methods, especially to cluster genes in partitions similar to biological subpathways.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Arabidopsis / genetics*
  • Arabidopsis / metabolism
  • Arabidopsis Proteins / genetics*
  • Arabidopsis Proteins / metabolism
  • Gene Expression Regulation, Plant*
  • Gene Regulatory Networks*
  • Genome, Plant*
  • High-Throughput Nucleotide Sequencing / methods*
  • Metabolic Networks and Pathways
  • Sequence Analysis, RNA / methods*
  • Transcriptome

Substances

  • Arabidopsis Proteins