FHSA-SED: Two-Locus Model Detection for Genome-Wide Association Study with Harmony Search Algorithm

PLoS One. 2016 Mar 25;11(3):e0150669. doi: 10.1371/journal.pone.0150669. eCollection 2016.

Abstract

Motivation: Two-locus model is a typical significant disease model to be identified in genome-wide association study (GWAS). Due to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power, high computation cost, and preference for some types of disease models.

Method: In this study, two scoring functions (Bayesian network based K2-score and Gini-score) are used for characterizing two SNP locus as a candidate model, the two criteria are adopted simultaneously for improving identification power and tackling the preference problem to disease models. Harmony search algorithm (HSA) is improved for quickly finding the most likely candidate models among all two-locus models, in which a local search algorithm with two-dimensional tabu table is presented to avoid repeatedly evaluating some disease models that have strong marginal effect. Finally G-test statistic is used to further test the candidate models.

Results: We investigate our method named FHSA-SED on 82 simulated datasets and a real AMD dataset, and compare it with two typical methods (MACOED and CSE) which have been developed recently based on swarm intelligent search algorithm. The results of simulation experiments indicate that our method outperforms the two compared algorithms in terms of detection power, computation time, evaluation times, sensitivity (TPR), specificity (SPC), positive predictive value (PPV) and accuracy (ACC). Our method has identified two SNPs (rs3775652 and rs10511467) that may be also associated with disease in AMD dataset.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bayes Theorem*
  • Computer Simulation
  • Epistasis, Genetic
  • Genetic Predisposition to Disease*
  • Genome-Wide Association Study / methods
  • Genome-Wide Association Study / statistics & numerical data*
  • Humans
  • Machine Learning
  • Polymorphism, Single Nucleotide / genetics

Grants and funding

This work was supported by the Natural Science Foundation of China under Grants 61571341, 61201312, 91530113, 11401357, Research Fund for the Doctoral Program of Higher Education of China (No. 2013 0203110017), the Fundamental Research Funds for the Central Universities of China (Nos. BDY171416 and JB140306), the Natural Science Foundation of Shaanxi Province in China (2015JM6275).