DBGSA: a novel method of distance-based gene set analysis

J Hum Genet. 2012 Oct;57(10):642-53. doi: 10.1038/jhg.2012.86. Epub 2012 Jul 12.

Abstract

When compared with single gene functional analysis, gene set analysis (GSA) can extract more information from gene expression profiles. Currently, several gene set methods have been proposed, but most of the methods cannot detect gene sets with a large number of minor-effect genes. Here, we propose a novel distance-based gene set analysis method. The distance between two groups of genes with different phenotypes based on gene expression should be larger if a certain gene set is significantly associated with the given phenotype. We calculated the distance between two groups with different phenotypes, estimated the significant P-values using two permutation methods and performed multiple hypothesis testing adjustments. This method was performed on one simulated data set and three real data sets. After a comparison and literature verification, we determined that the gene resampling-based permutation method is more suitable for GSA, and the centroid statistical and average linkage statistical distance methods are efficient, especially in detecting gene sets containing more minor-effect genes. We believe that this distance-based method will assist us in finding functional gene sets that are significantly related to a complex trait. Additionally, we have prepared a simple and publically available Perl and R package (http://bioinfo.hrbmu.edu.cn/dbgsa or http://cran.r-project.org/web/packages/DBGSA/).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Carcinoma, Non-Small-Cell Lung / genetics
  • Case-Control Studies
  • Computational Biology / methods*
  • Computer Simulation
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic
  • Genes, Neoplasm*
  • Humans
  • Logistic Models
  • Phenotype
  • Predictive Value of Tests
  • Sensitivity and Specificity
  • Software*
  • Transcriptome