Markov Models for inferring copy number variations from genotype data on Illumina platforms

Hum Hered. 2009;68(1):1-22. doi: 10.1159/000210445. Epub 2009 Apr 1.

Abstract

Background/aims: Illumina genotyping arrays provide information on DNA copy number. Current methodology for their analysis assumes linkage equilibrium across adjacent markers. This is unrealistic, given the markers high density, and can result in reduced specificity. Another limitation of current methods is that they cannot be directly applied to the analysis of multiple samples with the goal of detecting copy number polymorphisms and their association with traits of interest.

Methods: We propose a new Hidden Markov Model for Illumina genotype data, that takes into account linkage disequilibrium between adjacent loci. Our framework also allows for location specific deletion/duplication rates. When multiple samples are available, we describe a methodology for their analysis that simultaneously reconstructs the copy number states in each sample and identifies genomic locations with increased variability in copy number in the population. This approach can be extended to test association between copy number variants and a disease trait.

Results and conclusions: We show that taking into account linkage disequilibrium between adjacent markers can increase the specificity of a HMM in reconstructing copy number variants, especially single copy deletions. Our multisample approach is computationally practical and can increase the power of association studies.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Gene Deletion
  • Gene Dosage*
  • Genotype
  • Humans
  • Linkage Disequilibrium
  • Markov Chains*
  • Polymorphism, Genetic*