New multilocus linkage disequilibrium measure for tag SNP selection

J Bioinform Comput Biol. 2017 Feb;15(1):1750001. doi: 10.1142/S0219720017500019.

Abstract

Numerous approaches have been proposed for selecting an optimal tag single-nucleotide polymorphism (SNP) set. Most of these approaches are based on linkage disequilibrium (LD). Classical LD measures, such as D' and r2, are frequently used to quantify the relationship between two marker (pairwise) linkage disequilibria. Despite of their successful use in many applications, these measures cannot be used to measure the LD between multiple-marker. These LD measures need information about the frequencies of alleles collected from haplotype dataset. In this study, a cluster algorithm is proposed to cluster SNPs according to multilocus LD measure which is based on information theory. After that, tag SNPs are selected in each cluster optimized by the number of tag SNPs, prediction accuracy and so on. The experimental results show that this new LD measure can be directly applied to genotype dataset collected from the HapMap project, so that it saves the cost of haplotyping. More importantly, the proposed method significantly improves the efficiency and prediction accuracy of tag SNP selection.

Keywords: Tag SNP; clustering algorithms; entropy; linkage disequilibrium (LD).

MeSH terms

  • Algorithms*
  • Alleles
  • Chromosomes, Human
  • Cluster Analysis
  • Haplotypes
  • Humans
  • Linkage Disequilibrium*
  • Multilocus Sequence Typing / methods*
  • Polymorphism, Single Nucleotide*