SNPs selection using support vector regression and genetic algorithms in GWAS

BMC Genomics. 2014;15 Suppl 7(Suppl 7):S4. doi: 10.1186/1471-2164-15-S7-S4. Epub 2014 Oct 27.

Abstract

Introduction: This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence.

Results: The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS.

Conclusions: The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Artificial Intelligence
  • Cattle / genetics
  • Computational Biology
  • Computer Simulation
  • Databases, Nucleic Acid
  • Female
  • Genetic Markers
  • Genetic Techniques
  • Genome-Wide Association Study / methods*
  • Male
  • Models, Statistical
  • Phenotype
  • Polymorphism, Single Nucleotide*
  • Software

Substances

  • Genetic Markers