Mapping complex traits using Random Forests

Alexandre Bureau; Josée Dupuis; Brooke Hayward; Kathleen Falls; Paul Van Eerdewegh

doi:10.1186/1471-2156-4-S1-S64

Mapping complex traits using Random Forests

BMC Genet. 2003 Dec 31;4 Suppl 1(Suppl 1):S64. doi: 10.1186/1471-2156-4-S1-S64.

Authors

Alexandre Bureau¹, Josée Dupuis, Brooke Hayward, Kathleen Falls, Paul Van Eerdewegh

Affiliation

¹ Genome Therapeutics Corporation, Waltham, Massachusetts 02453, USA. alexandre.bureau@uleth.ca

Abstract

Random Forest is a prediction technique based on growing trees on bootstrap samples of data, in conjunction with a random selection of explanatory variables to define the best split at each node. In the case of a quantitative outcome, the tree predictor takes on a numerical value. We applied Random Forest to the first replicate of the Genetic Analysis Workshop 13 simulated data set, with the sibling pairs as our units of analysis and identity by descent (IBD) at selected loci as our explanatory variables. With the knowledge of the true model, we performed two sets of analyses on three phenotypes: HDL, triglycerides, and glucose. The goal was to approach the mapping of complex traits from a multivariate perspective. The first set of analyses mimics a candidate gene approach with a high proportion of true genes among the predictors while the second set represents a genome scan analysis using microsatellite markers. Random Forest was able to identify a few of the major genes influencing the phenotypes, such as baseline HDL and triglycerides, but failed to identify the major genes regulating baseline glucose levels.

MeSH terms

Chromosome Mapping / statistics & numerical data*
Chromosomes, Human, Pair 1 / genetics
Chromosomes, Human, Pair 12 / genetics
Chromosomes, Human, Pair 17 / genetics
Chromosomes, Human, Pair 19 / genetics
Chromosomes, Human, Pair 9 / genetics
Computer Simulation / statistics & numerical data
Genetic Markers / genetics
Genome, Human
Humans
Matched-Pair Analysis
Microsatellite Repeats / genetics
Multifactorial Inheritance / genetics*
Multivariate Analysis
Pedigree*
Phenotype
Predictive Value of Tests
Quantitative Trait Loci / genetics*
Quantitative Trait, Heritable*
Siblings
Software / statistics & numerical data

Substances

Genetic Markers