Random forests on distance matrices for imaging genetics studies

Stat Appl Genet Mol Biol. 2013 Dec;12(6):757-86. doi: 10.1515/sagmb-2013-0040.

Abstract

We propose a non-parametric regression methodology, Random Forests on Distance Matrices (RFDM), for detecting genetic variants associated to quantitative phenotypes, obtained using neuroimaging techniques, representing the human brain's structure or function. RFDM, which is an extension of decision forests, requires a distance matrix as the response that encodes all pair-wise phenotypic distances in the random sample. We discuss ways to learn such distances directly from the data using manifold learning techniques, and how to define such distances when the phenotypes are non-vectorial objects such as brain connectivity networks. We also describe an extension of RFDM to detect espistatic effects while keeping the computational complexity low. Extensive simulation results and an application to an imaging genetics study of Alzheimer's Disease are presented and discussed.

MeSH terms

  • Algorithms
  • Alzheimer Disease / genetics
  • Alzheimer Disease / physiopathology
  • Artificial Intelligence
  • Brain / pathology
  • Brain / physiopathology
  • Case-Control Studies
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Decision Trees
  • Epistasis, Genetic
  • Genetic Association Studies
  • Humans
  • Models, Genetic*
  • Neuroimaging*
  • Organ Size / genetics
  • Phenotype
  • Polymorphism, Single Nucleotide
  • ROC Curve