Application of Multi-SOM clustering approach to macrophage gene expression analysis

Infect Genet Evol. 2009 May;9(3):328-36. doi: 10.1016/j.meegid.2008.09.009. Epub 2008 Oct 17.

Abstract

The production of increasingly reliable and accessible gene expression data has stimulated the development of computational tools to interpret such data and to organize them efficiently. The clustering techniques are largely recognized as useful exploratory tools for gene expression data analysis. Genes that show similar expression patterns over a wide range of experimental conditions can be clustered together. This relies on the hypothesis that genes that belong to the same cluster are coregulated and involved in related functions. Nevertheless, clustering algorithms still show limits, particularly for the estimation of the number of clusters and the interpretation of hierarchical dendrogram, which may significantly influence the outputs of the analysis process. We propose here a multi level SOM based clustering algorithm named Multi-SOM. Through the use of clustering validity indices, Multi-SOM overcomes the problem of the estimation of clusters number. To test the validity of the proposed clustering algorithm, we first tested it on supervised training data sets. Results were evaluated by computing the number of misclassified samples. We have then used Multi-SOM for the analysis of macrophage gene expression data generated in vitro from the same individual blood infected with 5 different pathogens. This analysis led to the identification of sets of tightly coregulated genes across different pathogens. Gene Ontology tools were then used to estimate the biological significance of the clustering, which showed that the obtained clusters are coherent and biologically significant.

Publication types

  • Validation Study

MeSH terms

  • Algorithms
  • Animals
  • Breast Neoplasms / diagnosis
  • Cluster Analysis*
  • Diabetes Mellitus / diagnosis
  • Female
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation
  • Humans
  • Macrophages / physiology*
  • Multigene Family
  • Neural Networks, Computer*
  • Oligonucleotide Array Sequence Analysis
  • Pattern Recognition, Automated
  • Protozoan Infections / genetics
  • Tuberculosis, Pulmonary / genetics