Haplotype inference for present-absent genotype data using previously identified haplotypes and haplotype patterns

Bioinformatics. 2007 Sep 15;23(18):2399-406. doi: 10.1093/bioinformatics/btm371. Epub 2007 Jul 21.

Abstract

Motivation: Killer immunoglobulin-like receptor (KIR) genes vary considerably in their presence or absence on a specific regional haplotype. Because presence or absence of these genes is largely detected using locus-specific genotyping technology, the distinction between homozygosity and hemizygosity is often ambiguous. The performance of methods for haplotype inference (e.g. PL-EM, PHASE) for KIR genes may be compromised due to the large portion of ambiguous data. At the same time, many haplotypes or partial haplotype patterns have been previously identified and can be incorporated to facilitate haplotype inference for unphased genotype data. To accommodate the increased ambiguity of present-absent genotyping of KIR genes, we developed a hybrid approach combining a greedy algorithm with the Expectation-Maximization (EM) method for haplotype inference based on previously identified haplotypes and haplotype patterns.

Results: We implemented this algorithm in a software package named HAPLO-IHP (Haplotype inference using identified haplotype patterns) and compared its performance with that of HAPLORE and PHASE on simulated KIR genotypes. We compared five measures in order to evaluate the reliability of haplotype assignments and the accuracy in estimating haplotype frequency. Our method outperformed the two existing techniques by all five measures when either 60% or 25% of previously identified haplotypes were incorporated into the analyses.

Availability: The HAPLO-IHP is available at http://www.soph.uab.edu/Statgenetics/People/KZhang/HAPLO-IHP/index.html.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Base Sequence
  • Chromosome Mapping / methods*
  • DNA Mutational Analysis / methods*
  • Genotype*
  • Haplotypes / genetics*
  • Molecular Sequence Data
  • Pattern Recognition, Automated / methods*
  • Receptors, Immunologic / genetics*
  • Receptors, KIR
  • Sequence Analysis, DNA / methods*

Substances

  • Receptors, Immunologic
  • Receptors, KIR