Identification of cancer-associated gene clusters and genes via clustering penalization

Stat Interface. 2009 Jan 1;2(1):1-11. doi: 10.4310/sii.2009.v2.n1.a1.

Abstract

Identification of genes associated with cancer development and progression using microarray data is challenging because of the high dimensionality and cluster structure of gene expressions. Here the clusters are composed of multiple genes with coordinated biological functions and/or correlated expressions. In this article, we first propose a hybrid approach for clustering gene expressions. The hybrid approach uses both pathological pathway information and correlations of gene expressions. We propose using the group bridge, a novel clustering penalization approach, for analysis of cancer microarray data. The group bridge approach explicitly accounts for the cluster structure of gene expressions, and is capable of selecting gene clusters and genes within those selected clusters that are associated with cancer. We also develop an iterative algorithm for computing the group bridge estimator. Analysis of three cancer microarray datasets shows that the proposed approach can identify biologically meaningful gene clusters and genes within those identified clusters.