Disease association tests by inferring ancestral haplotypes using a hidden markov model

Bioinformatics. 2008 Apr 1;24(7):972-8. doi: 10.1093/bioinformatics/btn071. Epub 2008 Feb 23.

Abstract

Motivation: Most genome-wide association studies rely on single nucleotide polymorphism (SNP) analyses to identify causal loci. The increased stringency required for genome-wide analyses (with per-SNP significance threshold typically approximately 10(-7)) means that many real signals will be missed. Thus it is still highly relevant to develop methods with improved power at low type I error. Haplotype-based methods provide a promising approach; however, they suffer from statistical problems such as abundance of rare haplotypes and ambiguity in defining haplotype block boundaries.

Results: We have developed an ancestral haplotype clustering (AncesHC) association method which addresses many of these problems. It can be applied to biallelic or multiallelic markers typed in haploid, diploid or multiploid organisms, and also handles missing genotypes. Our model is free from the assumption of a rigid block structure but recognizes a block-like structure if it exists in the data. We employ a Hidden Markov Model (HMM) to cluster the haplotypes into groups of predicted common ancestral origin. We then test each cluster for association with disease by comparing the numbers of cases and controls with 0, 1 and 2 chromosomes in the cluster. We demonstrate the power of this approach by simulation of case-control status under a range of disease models for 1500 outcrossed mice originating from eight inbred lines. Our results suggest that AncesHC has substantially more power than single-SNP analyses to detect disease association, and is also more powerful than the cladistic haplotype clustering method CLADHC.

Availability: The software can be downloaded from http://www.imperial.ac.uk/medicine/people/l.coin.

MeSH terms

  • Algorithms*
  • Biological Evolution*
  • Chromosome Mapping / methods*
  • Computer Simulation
  • Genetic Predisposition to Disease / genetics*
  • Haplotypes / genetics*
  • Markov Chains
  • Models, Genetic*
  • Models, Statistical
  • Pattern Recognition, Automated / methods