Identification of differentially expressed genes with multivariate outlier analysis

J Biopharm Stat. 2004 Aug;14(3):629-46. doi: 10.1081/BIP-200025654.

Abstract

DNA microarray offers a powerful and effective technology to monitor the changes in the gene expression levels for thousands of genes simultaneously. It is being widely applied to explore the quantitative alternation in gene regulation in response to a variety of aspects including diseases and exposure of toxicant. A common task in analyzing microarray data is to identify the differentially expressed genes under two different experimental conditions. Because of the large number of genes and small number of arrays, and higher signal-noise ratio in microarray data, many traditional approaches seem improper. In this paper, a multivariate mixture model is applied to model the expression level of replicated arrays, considering the differentially expressed genes as the outliers of the expression data. In order to detect the outliers of the multivariate mixture model, an effective and robust statistical method is first applied to microarray analysis. This method is based on the analysis of kurtosis coefficient (KC) of the projected multivariate data arising from a mixture model so as to identify the outliers. We utilize the multivariate KC algorithm to our microarray experiment with the control and toxic treatment. After the processing of data, the differential genes are successfully identified from 1824 genes on the UCLA M07 microarray chip. We also use the RT-PCR method and two robust statistical methods, minimum covariance determinant (MCD) and minimum volume ellipsoid (MVE), to verify the expression level of outlier genes identified by KC algorithm. We conclude that the robust multivariate tool is practical and effective for the detection of differentially expressed genes.

MeSH terms

  • Algorithms
  • Animals
  • Cadmium Chloride / toxicity
  • Data Interpretation, Statistical
  • Male
  • Mice
  • Mice, Inbred ICR
  • Microcomputers
  • Models, Statistical
  • Multivariate Analysis*
  • Mutagens / toxicity
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data*
  • Reverse Transcriptase Polymerase Chain Reaction

Substances

  • Mutagens
  • Cadmium Chloride