Pathway-based identification of SNPs predictive of survival

Eur J Hum Genet. 2011 Jun;19(6):704-9. doi: 10.1038/ejhg.2011.3. Epub 2011 Feb 2.

Abstract

In recent years, several association analysis methods for case-control studies have been developed. However, as we turn towards the identification of single nucleotide polymorphisms (SNPs) for prognosis, there is a need to develop methods for the identification of SNPs in high dimensional data with survival outcomes. Traditional methods for the identification of SNPs have some drawbacks. First, the majority of the approaches for case-control studies are based on single SNPs. Second, SNPs that are identified without incorporating biological knowledge are more difficult to interpret. Random forests has been found to perform well in gene expression analysis with survival outcomes. In this paper we present the first pathway-based method to correlate SNP with survival outcomes using a machine learning algorithm. We illustrate the application of pathway-based analysis of SNPs predictive of survival with a data set of 192 multiple myeloma patients genotyped for 500,000 SNPs. We also present simulation studies that show that the random forests technique with log-rank score split criterion outperforms several other machine learning algorithms. Thus, pathway-based survival analysis using machine learning tools represents a promising approach for the identification of biologically meaningful SNPs associated with disease.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Case-Control Studies
  • Computer Simulation
  • Genetic Predisposition to Disease
  • Genotype
  • Humans
  • Metabolic Networks and Pathways / genetics*
  • Models, Genetic*
  • Multiple Myeloma / diagnosis
  • Multiple Myeloma / genetics*
  • Multiple Myeloma / metabolism
  • Multiple Myeloma / mortality
  • Polymorphism, Single Nucleotide*
  • Predictive Value of Tests
  • Survival Analysis