Subdimension-based similarity measure for DNA microarray data clustering

Phys Rev E Stat Nonlin Soft Matter Phys. 2006 Oct;74(4 Pt 1):041906. doi: 10.1103/PhysRevE.74.041906. Epub 2006 Oct 9.

Abstract

Microarray data analysis is useful for understanding biological processes. A number of clustering algorithms have been used to achieve this task. However, the performance of these methods can be significantly degraded due to the presence of nonsignificant conditions. In this paper, we propose a robust clustering algorithm based on a similarity measure. The key concept of the proposed similarity measure is to measure the similarity between two data points by their subdimensions. For example, assume that x1, x2, and x3 are ten-dimensional data vectors. The data point x3 is said to be closer to x1 than x2 if more than half of the dimensions of x1 and x3 are closer to x1 than x2. Thus, if two patterns are very similar except for a small amount of features, this measure will preserve the similarity. We have performed eight experiments to test the robustness of the proposed method, including three synthetic data sets, three real world data sets, and two microarray data sets. We also have compared the proposed method with four different clustering algorithms. Experimental results show that the proposed method yields better results than existing clustering algorithms.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Cluster Analysis*
  • Gene Expression Profiling / methods*
  • Multigene Family
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated / methods*