Clustering gene expression data using adaptive double self-organizing map

Physiol Genomics. 2003 Jun 24;14(1):35-46. doi: 10.1152/physiolgenomics.00138.2002.

Abstract

This paper presents a novel clustering technique known as adaptive double self-organizing map (ADSOM). ADSOM has a flexible topology and performs clustering and cluster visualization simultaneously, thereby requiring no a priori knowledge about the number of clusters. ADSOM is developed based on a recently introduced technique known as double self-organizing map (DSOM). DSOM combines features of the popular self-organizing map (SOM) with two-dimensional position vectors, which serve as a visualization tool to decide how many clusters are needed. Although DSOM addresses the problem of identifying unknown number of clusters, its free parameters are difficult to control to guarantee correct results and convergence. ADSOM updates its free parameters during training, and it allows convergence of its position vectors to a fairly consistent number of clusters provided that its initial number of nodes is greater than the expected number of clusters. The number of clusters can be identified by visually counting the clusters formed by the position vectors after training. A novel index is introduced based on hierarchical clustering of the final locations of position vectors. The index allows automated detection of the number of clusters, thereby reducing human error that could be incurred from counting clusters visually. The reliance of ADSOM in identifying the number of clusters is proven by applying it to publicly available gene expression data from multiple biological systems such as yeast, human, and mouse. ADSOM's performance in detecting number of clusters is compared with a model-based clustering method.

MeSH terms

  • Animals
  • Cell Cycle Proteins / genetics
  • Cell Line
  • Chromosome Mapping / methods*
  • Chromosome Mapping / statistics & numerical data*
  • Cluster Analysis
  • Computational Biology / methods
  • Computational Biology / statistics & numerical data
  • Fibroblasts / chemistry
  • Fibroblasts / metabolism
  • GTP-Binding Proteins / genetics
  • Gene Expression Profiling / statistics & numerical data*
  • Gene Expression Regulation / genetics*
  • Gene Expression Regulation, Enzymologic / genetics
  • Gene Expression Regulation, Fungal / genetics
  • Gene Expression Regulation, Neoplastic / genetics
  • Genes, Fungal / genetics
  • Genes, Neoplasm / genetics
  • Genes, cdc
  • Humans
  • Mice
  • Protein Kinases / genetics
  • Saccharomyces cerevisiae / enzymology
  • Saccharomyces cerevisiae / genetics
  • Tumor Cells, Cultured

Substances

  • CDC15 protein
  • Cell Cycle Proteins
  • Protein Kinases
  • GTP-Binding Proteins