TRM: a powerful two-stage machine learning approach for identifying SNP-SNP interactions

Ann Hum Genet. 2012 Jan;76(1):53-62. doi: 10.1111/j.1469-1809.2011.00692.x. Epub 2011 Dec 11.

Abstract

Studies have shown that interactions of single nucleotide polymorphisms (SNPs) may play an important role in understanding the causes of complex disease. We have proposed an integrated machine learning method that combines two machine-learning methods-Random Forests (RF) and Multivariate Adaptive Regression Splines (MARS)-to identify a subset of important SNPs and detect interaction patterns more effectively and efficiently. In this two-stage RF-MARS (TRM) approach, RF is first applied to detect a predictive subset of SNPs, and then MARS is used to identify the interaction patterns. We evaluated the TRM performances in four models. RF variable selection was based on out-of-bag classification error rate (OOB) and variable important spectrum (IS). Our results support that RF(OOB) had better performance than MARS and RF(IS) in detecting important variables. This study demonstrates that TRM(OOB) , which is RF(OOB) plus MARS, has combined the strengths of RF and MARS in identifying SNP-SNP interactions in a scenario of 100 candidate SNPs. TRM(OOB) had greater true positive rate and lower false positive rate compared with MARS, particularly for searching interactions with a strong association with the outcome. Therefore, the use of TRM(OOB) is favored for exploring SNP-SNP interactions in a large-scale genetic variation study.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • Decision Trees
  • Genotype
  • Humans
  • Male
  • Models, Genetic*
  • Models, Statistical
  • Polymorphism, Single Nucleotide*
  • Prostatic Neoplasms / pathology
  • Receptors, Estrogen / genetics

Substances

  • Receptors, Estrogen