KMeans greedy search hybrid algorithm for biclustering gene expression data

Adv Exp Med Biol. 2010:680:181-8. doi: 10.1007/978-1-4419-5913-3_21.

Abstract

Microarray technology demands the development of algorithms capable of extracting novel and useful patterns like biclusters. A bicluster is a submatrix of the gene expression datamatrix such that the genes show highly correlated activities across all conditions in the submatrix. A measure called Mean Squared Residue (MSR) is used to evaluate the coherence of rows and columns within the submatrix. In this paper, the KMeans greedy search hybrid algorithm is developed for finding biclusters from the gene expression data. This algorithm has two steps. In the first step, high quality bicluster seeds are generated using KMeans clustering algorithm. In the second step, these seeds are enlarged by adding more genes and conditions using the greedy strategy. Here, the objective is to find the biclusters with maximum size and the MSR value lower than a given threshold. The biclusters obtained from this algorithm on both the bench mark datasets are of high quality. The statistical significance and biological relevance of the biclusters are verified using gene ontology database.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Computational Biology
  • Databases, Genetic
  • Humans
  • Lymphoma / genetics
  • Multigene Family
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data*
  • Saccharomyces cerevisiae / genetics
  • Search Engine