High-performance gene expression module analysis tool and its application to chemical toxicity data

Wataru Fujibuchi; Hyeryung Kim; Yoshifumi Okada; Takeaki Taniguchi; Hideko Sone

doi:10.1007/978-1-60761-232-2_5

High-performance gene expression module analysis tool and its application to chemical toxicity data

Methods Mol Biol. 2009:577:55-65. doi: 10.1007/978-1-60761-232-2_5.

Authors

Wataru Fujibuchi¹, Hyeryung Kim, Yoshifumi Okada, Takeaki Taniguchi, Hideko Sone

Affiliation

¹ National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.

PMID: 19718508
DOI: 10.1007/978-1-60761-232-2_5

Abstract

Gene clustering is one of the main themes of data mining approaches in bioinformatics. Although it has the power to analyze gene function, interpretation of the results becomes increasingly difficult when the number of experiments (samples) exceeds hundreds or more. A new type of clustering called "biclustering," where genes and experiments are coclustered in a large-scale of gene expression data, has been extensively studied in the last decade. We have developed "SAMURAI," an original program that detects all the biclusters or "gene modules" whose genes have similar expression patterns to query profile using the ultrafast data mining algorithm called Linear-time Closed itemset Miner (LCM). Using chemical toxicity dataset from J&J rat liver experiments, we compiled an exhaustive dictionary of gene modules by searching datasets of gene modules with each chemical exposure experiment as query. Through the module analysis, we found that our program can detect up/down-regulated gene sets that significantly represent particular GO functions or KEGG pathways, thereby unraveling reactions and mechanisms common to different toxicochemical treatments of hepatocytes.

MeSH terms

Algorithms
Animals
Cluster Analysis
Computational Biology
Databases, Factual
Gene Expression Profiling / statistics & numerical data*
Liver / drug effects
Liver / metabolism
Molecular Biology / methods
Rats
Toxicology / statistics & numerical data*