A robust two-way semi-linear model for normalization of cDNA microarray data

BMC Bioinformatics. 2005 Jan 21:6:14. doi: 10.1186/1471-2105-6-14.

Abstract

Background: Normalization is a basic step in microarray data analysis. A proper normalization procedure ensures that the intensity ratios provide meaningful measures of relative expression values.

Methods: We propose a robust semiparametric method in a two-way semi-linear model (TW-SLM) for normalization of cDNA microarray data. This method does not make the usual assumptions underlying some of the existing methods. For example, it does not assume that: (i) the percentage of differentially expressed genes is small; or (ii) the numbers of up- and down-regulated genes are about the same, as required in the LOWESS normalization method. We conduct simulation studies to evaluate the proposed method and use a real data set from a specially designed microarray experiment to compare the performance of the proposed method with that of the LOWESS normalization approach.

Results: The simulation results show that the proposed method performs better than the LOWESS normalization method in terms of mean square errors for estimated gene effects. The results of analysis of the real data set also show that the proposed method yields more consistent results between the direct and the indirect comparisons and also can detect more differentially expressed genes than the LOWESS method.

Conclusions: Our simulation studies and the real data example indicate that the proposed robust TW-SLM method works at least as well as the LOWESS method and works better when the underlying assumptions for the LOWESS method are not satisfied. Therefore, it is a powerful alternative to the existing normalization methods.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Analysis of Variance
  • Calibration
  • Computational Biology / methods*
  • Computer Graphics
  • DNA, Complementary / metabolism*
  • Data Interpretation, Statistical
  • Gene Expression Profiling
  • Humans
  • Likelihood Functions
  • Linear Models
  • Models, Genetic
  • Models, Statistical
  • Nonlinear Dynamics
  • Oligonucleotide Array Sequence Analysis / methods*
  • Placenta / metabolism
  • Reference Standards
  • Regression Analysis
  • Software

Substances

  • DNA, Complementary