Using tree-based methods for detection of gene-gene interactions in the presence of a polygenic signal: simulation study with application to educational attainment in the Generation Scotland Cohort Study

Bioinformatics. 2019 Jan 15;35(2):181-188. doi: 10.1093/bioinformatics/bty462.

Abstract

Motivation: The genomic architecture of human complex diseases is thought to be attributable to single markers, polygenic components and epistatic components. No study has examined the ability of tree-based methods to detect epistasis in the presence of a polygenic signal. We sought to apply decision tree-based methods, C5.0 and logic regression, to detect epistasis under several simulated conditions, varying strength of interaction and linkage disequilibrium (LD) structure. We then applied the same methods to the phenotype of educational attainment in a large population cohort.

Results: LD pruning improved the power and reduced the type I error. C5.0 had a conservative type I error rate whereas logic regression had a type I error rate that exceeded 5%. Despite the more conservative type I error, C5.0 was observed to have higher power than logic regression across several conditions. In the presence of a polygenic signal, power was generally reduced. Applying both methods on educational attainment in a large population cohort yielded numerous interacting SNPs; notably a SNP in RCAN3 which is associated with reading and spelling and a SNP in NPAS3, a neurodevelopmental gene.

Availability and implementation: All methods used are implemented and freely available in R.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adaptor Proteins, Signal Transducing / genetics
  • Basic Helix-Loop-Helix Transcription Factors
  • Cohort Studies
  • Computational Biology
  • Decision Trees
  • Epistasis, Genetic*
  • Genetic Markers
  • Genetics, Population / methods*
  • Humans
  • Linkage Disequilibrium
  • Multifactorial Inheritance*
  • Nerve Tissue Proteins / genetics
  • Polymorphism, Single Nucleotide
  • Scotland
  • Software*
  • Transcription Factors / genetics

Substances

  • Adaptor Proteins, Signal Transducing
  • Basic Helix-Loop-Helix Transcription Factors
  • Genetic Markers
  • NPAS3 protein, human
  • Nerve Tissue Proteins
  • RCAN3 protein, human
  • Transcription Factors