Self-Contained Statistical Analysis of Gene Sets

PLoS One. 2016 Oct 6;11(10):e0163918. doi: 10.1371/journal.pone.0163918. eCollection 2016.

Abstract

Microarrays are a powerful tool for studying differential gene expression. However, lists of many differentially expressed genes are often generated, and unraveling meaningful biological processes from the lists can be challenging. For this reason, investigators have sought to quantify the statistical probability of compiled gene sets rather than individual genes. The gene sets typically are organized around a biological theme or pathway. We compute correlations between different gene set tests and elect to use Fisher's self-contained method for gene set analysis. We improve Fisher's differential expression analysis of a gene set by limiting the p-value of an individual gene within the gene set to prevent a small percentage of genes from determining the statistical significance of the entire set. In addition, we also compute dependencies among genes within the set to determine which genes are statistically linked. The method is applied to T-ALL (T-lineage Acute Lymphoblastic Leukemia) to identify differentially expressed gene sets between T-ALL and normal patients and T-ALL and AML (Acute Myeloid Leukemia) patients.

MeSH terms

  • Child
  • Gene Expression Profiling*
  • Humans
  • Oligonucleotide Array Sequence Analysis*
  • Precursor T-Cell Lymphoblastic Leukemia-Lymphoma / genetics
  • Statistics as Topic / methods*