Combining Random Forests and a Signal Detection Method Leads to the Robust Detection of Genotype-Phenotype Associations

Genes (Basel). 2020 Aug 5;11(8):892. doi: 10.3390/genes11080892.

Abstract

Genome wide association studies (GWAS) are a well established methodology to identify genomic variants and genes that are responsible for traits of interest in all branches of the life sciences. Despite the long time this methodology has had to mature the reliable detection of genotype-phenotype associations is still a challenge for many quantitative traits mainly because of the large number of genomic loci with weak individual effects on the trait under investigation. Thus, it can be hypothesized that many genomic variants that have a small, however real, effect remain unnoticed in many GWAS approaches. Here, we propose a two-step procedure to address this problem. In a first step, cubic splines are fitted to the test statistic values and genomic regions with spline-peaks that are higher than expected by chance are considered as quantitative trait loci (QTL). Then the SNPs in these QTLs are prioritized with respect to the strength of their association with the phenotype using a Random Forests approach. As a case study, we apply our procedure to real data sets and find trustworthy numbers of, partially novel, genomic variants and genes involved in various egg quality traits.

Keywords: Random Forests; boruta; egg weight; eggshell strength; genome wide association studies; signal detection.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Chickens / genetics*
  • Eggs / standards
  • Gene-Environment Interaction*
  • Genome-Wide Association Study / methods*
  • Machine Learning
  • Quantitative Trait Loci