Integrating regulatory features data for prediction of functional disease-associated SNPs

Brief Bioinform. 2019 Jan 18;20(1):26-32. doi: 10.1093/bib/bbx094.

Abstract

Genome-wide association studies (GWASs) are an effective strategy to identify susceptibility loci for human complex diseases. However, missing heritability is still a big problem. Most GWASs single-nucleotide polymorphisms (SNPs) are located in noncoding regions, which has been considered to be the unexplored territory of the genome. Recently, data from the Encyclopedia of DNA Elements (ENCODE) and Roadmap Epigenomics projects have shown that many GWASs SNPs in the noncoding regions fall within regulatory elements. In this study, we developed a pipeline named functional disease-associated SNPs prediction (FDSP), to identify novel susceptibility loci for complex diseases based on the interpretation of the functional features for known disease-associated variants with machine learning. We applied our pipeline to predict novel susceptibility SNPs for type 2 diabetes (T2D) and hypertension. The predicted SNPs could explain heritability beyond that explained by GWAS-associated SNPs. Functional annotation by expression quantitative trait loci analyses showed that the target genes of the predicted SNPs were significantly enriched in T2D or hypertension-related pathways in multiple tissues. Our results suggest that combining GWASs and regulatory features data could identify additional functional susceptibility SNPs for complex diseases. We hope FDSP could help to identify novel susceptibility loci for complex diseases and solve the missing heritability problem.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology
  • Diabetes Mellitus, Type 2 / genetics
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study / statistics & numerical data*
  • Humans
  • Hypertension / genetics
  • Machine Learning
  • Models, Genetic
  • Models, Statistical
  • Multifactorial Inheritance
  • Polymorphism, Single Nucleotide*
  • Quantitative Trait Loci
  • Regulatory Sequences, Nucleic Acid
  • Software*