Comparison of supervised clustering methods to discriminate genotoxic from non-genotoxic carcinogens by gene expression profiling

Mutat Res. 2005 Aug 4;575(1-2):17-33. doi: 10.1016/j.mrfmmm.2005.02.006. Epub 2005 Apr 19.

Abstract

Prediction of the toxic properties of chemicals based on modulation of gene expression profiles in exposed cells or animals is one of the major applications of toxicogenomics. Previously, we demonstrated that by Pearson correlation analysis of gene expression profiles from treated HepG2 cells it is possible to correctly discriminate and predict genotoxic from non-genotoxic carcinogens. Since to date many different supervised clustering methods for discrimination and prediction tests are available, we investigated whether application of the methods provided by the Whitehead Institute and Stanford University improved our initial prediction. Four different supervised clustering methods were applied for this comparison, namely Pearson correlation analysis (Pearson), nearest shrunken centroids analysis (NSC), K-nearest neighbour analysis (KNN) and Weighted voting (WV). For each supervised clustering method, three different approaches were followed: (1) using all the data points for all treatments, (2) exclusion of the samples with marginally affected gene expression profiles and (3) filtering out the gene expression signals that were hardly altered. On the complete data set, NSC, KNN and WV outperformed the Pearson test, but on the reduced data sets no clear difference was observed. Exclusion of samples with marginally affected profiles improved the prediction by all methods. For the various prediction models, gene sets of different compositions were selected; in these 27 genes appeared three times or more. These 27 genes are involved in many different biological processes and molecular functions, such as apoptosis, cell cycle control, regulation of transcription, and transporter activity, many of them related to the carcinogenic process. One gene, BAX, was selected in all 10 models, while ZFP36 was selected in 9, and AHR, MT1E and TTR in 8. Summarising, this study demonstrates that several supervised clustering methods can be used to discriminate certain genotoxic from non-genotoxic carcinogens by gene expression profiling in vitro in HepG2 cells. None of the methods clearly outperforms the others.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Carcinogens / classification
  • Carcinogens / toxicity*
  • Cell Line, Tumor
  • Cluster Analysis
  • Gene Expression Profiling
  • Humans
  • Models, Statistical
  • Mutagens / toxicity*
  • Oligonucleotide Array Sequence Analysis
  • Toxicity Tests
  • Xenobiotics / toxicity*

Substances

  • Carcinogens
  • Mutagens
  • Xenobiotics