Stochastic segmentation models for array-based comparative genomic hybridization data analysis

Biostatistics. 2008 Apr;9(2):290-307. doi: 10.1093/biostatistics/kxm031. Epub 2007 Sep 12.

Abstract

Array-based comparative genomic hybridization (array-CGH) is a high throughput, high resolution technique for studying the genetics of cancer. Analysis of array-CGH data typically involves estimation of the underlying chromosome copy numbers from the log fluorescence ratios and segmenting the chromosome into regions with the same copy number at each location. We propose for the analysis of array-CGH data, a new stochastic segmentation model and an associated estimation procedure that has attractive statistical and computational properties. An important benefit of this Bayesian segmentation model is that it yields explicit formulas for posterior means, which can be used to estimate the signal directly without performing segmentation. Other quantities relating to the posterior distribution that are useful for providing confidence assessments of any given segmentation can also be estimated by using our method. We propose an approximation method whose computation time is linear in sequence length which makes our method practically applicable to the new higher density arrays. Simulation studies and applications to real array-CGH data illustrate the advantages of the proposed approach.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bayes Theorem*
  • Chromosome Mapping / methods
  • Computational Biology / methods*
  • Cytogenetic Analysis / methods*
  • Gene Dosage
  • Gene Expression Profiling / methods
  • Genomics / methods
  • Humans
  • Neoplasms / genetics
  • Oligonucleotide Array Sequence Analysis / methods*
  • Oligonucleotide Probes / analysis*
  • Sequence Analysis, DNA / methods

Substances

  • Oligonucleotide Probes