regBase: whole genome base-wise aggregation and functional prediction for human non-coding regulatory variants

Nucleic Acids Res. 2019 Dec 2;47(21):e134. doi: 10.1093/nar/gkz774.

Abstract

Predicting the functional or pathogenic regulatory variants in the human non-coding genome facilitates the interpretation of disease causation. While numerous prediction methods are available, their performance is inconsistent or restricted to specific tasks, which raises the demand of developing comprehensive integration for those methods. Here, we compile whole genome base-wise aggregations, regBase, that incorporate largest prediction scores. Building on different assumptions of causality, we train three composite models to score functional, pathogenic and cancer driver non-coding regulatory variants respectively. We demonstrate the superior and stable performance of our models using independent benchmarks and show great success to fine-map causal regulatory variants on specific locus or at base-wise resolution. We believe that regBase database together with three composite models will be useful in different areas of human genetic studies, such as annotation-based casual variant fine-mapping, pathogenic variant discovery as well as cancer driver mutation identification. regBase is freely available at https://github.com/mulinlab/regBase.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Genetic*
  • Datasets as Topic
  • Genome, Human*
  • Genome-Wide Association Study / methods*
  • Humans
  • Neoplasms / genetics
  • Polymorphism, Single Nucleotide / genetics
  • Software*