Utility of Extrapolating Human S1500+ Genes to the Whole Transcriptome: Tunicamycin Case Study

Bioinform Biol Insights. 2020 Sep 29:14:1177932220952742. doi: 10.1177/1177932220952742. eCollection 2020.

Abstract

The TempO-Seq S1500+ platform(s), now available for human, mouse, rat, and zebrafish, measures a discrete number of genes that are representative of biological and pathway co-regulation across the entire genome in a given species. While measurement of these genes alone provides a direct assessment of gene expression activity, extrapolating expression values to the whole transcriptome (~26 000 genes in humans) can estimate measurements of non-measured genes of interest and increases the power of pathway analysis algorithms by using a larger background gene expression space. Here, we use data from primary hepatocytes of 54 donors that were treated with the endoplasmic reticulum (ER) stress inducer tunicamycin and then measured on the human S1500+ platform containing ~3000 representative genes. Measurements for the S1500+ genes were then used to extrapolate expression values for the remaining human transcriptome. As a case study of the improved downstream analysis achieved by extrapolation, the "measured only" and "whole transcriptome" (measured + extrapolated) gene sets were compared. Extrapolation increased the number of significant genes by 49%, bringing to the forefront many that are known to be associated with tunicamycin exposure. The extrapolation procedure also correctly identified established tunicamycin-related functional pathways reflected by coordinated changes in interrelated genes while maintaining the sample variability observed from the "measured only" genes. Extrapolation improved the gene- and pathway-level biological interpretations for a variety of downstream applications, including differential expression analysis, gene set enrichment pathway analysis, DAVID keyword analysis, Ingenuity Pathway Analysis, and NextBio correlated compound analysis. The extrapolated data highlight the role of metabolism/metabolic pathways, the ER, immune response, and the unfolded protein response, each of which are key activities associated with tunicamycin exposure that were unrepresented or underrepresented in one or more of the analyses of the original "measured only" dataset. Furthermore, the inclusion of the extrapolated genes raised "tunicamycin" from third to first upstream regulator in Ingenuity Pathway Analysis and from sixth to second most correlated compound in NextBio analysis. Therefore, our case study suggests an approach to extend and enhance data from the S1500+ platform for improved insight into biological mechanisms and functional outcomes of diseases, drugs, and other perturbations.

Keywords: GeniE; S1500+; Transcriptomics; extrapolation; gene inference.