Maximum likelihood model based on minor allele frequencies and weighted Max-SAT formulation for haplotype assembly

J Theor Biol. 2014 Jun 7:350:49-56. doi: 10.1016/j.jtbi.2014.01.036. Epub 2014 Jan 31.

Abstract

Human haplotypes include essential information about SNPs, which in turn provide valuable information for such studies as finding relationships between some diseases and their potential genetic causes, e.g., for Genome Wide Association Studies. Due to expensiveness of directly determining haplotypes and recent progress in high throughput sequencing, there has been an increasing motivation for haplotype assembly, which is the problem of finding a pair of haplotypes from a set of aligned fragments. Although the problem has been extensively studied and a number of algorithms have already been proposed for the problem, more accurate methods are still beneficial because of high importance of the haplotypes information. In this paper, first, we develop a probabilistic model, that incorporates the Minor Allele Frequency (MAF) of SNP sites, which is missed in the existing maximum likelihood models. Then, we show that the probabilistic model will reduce to the Minimum Error Correction (MEC) model when the information of MAF is omitted and some approximations are made. This result provides a novel theoretical support for the MEC, despite some criticisms against it in the recent literature. Next, under the same approximations, we simplify the model to an extension of the MEC in which the information of MAF is used. Finally, we extend the haplotype assembly algorithm HapSAT by developing a weighted Max-SAT formulation for the simplified model, which is evaluated empirically with positive results.

Keywords: Algorithms; Haplotype reconstruction; Minimum error correction; Single individual haplotyping; Single nucleotide polymorphism.

MeSH terms

  • Algorithms*
  • Databases, Genetic
  • Gene Frequency / genetics*
  • Haplotypes / genetics*
  • Humans
  • Likelihood Functions
  • Models, Genetic