IW-Scoring: an Integrative Weighted Scoring framework for annotating and prioritizing genetic variations in the noncoding genome

Nucleic Acids Res. 2018 May 4;46(8):e47. doi: 10.1093/nar/gky057.

Abstract

The vast majority of germline and somatic variations occur in the noncoding part of the genome, only a small fraction of which are believed to be functional. From the tens of thousands of noncoding variations detectable in each genome, identifying and prioritizing driver candidates with putative functional significance is challenging. To address this, we implemented IW-Scoring, a new Integrative Weighted Scoring model to annotate and prioritise functionally relevant noncoding variations. We evaluate 11 scoring methods, and apply an unsupervised spectral approach for subsequent selective integration into two linear weighted functional scoring schemas for known and novel variations. IW-Scoring produces stable high-quality performance as the best predictors for three independent data sets. We demonstrate the robustness of IW-Scoring in identifying recurrent functional mutations in the TERT promoter, as well as disease SNPs in proximity to consensus motifs and with gene regulatory effects. Using follicular lymphoma as a paradigmatic cancer model, we apply IW-Scoring to locate 11 recurrently mutated noncoding regions in 14 follicular lymphoma genomes, and validate 9 of these regions in an extension cohort, including the promoter and enhancer regions of PAX5. Overall, IW-Scoring demonstrates greater versatility in identifying trait- and disease-associated noncoding variants. Scores from IW-Scoring as well as other methods are freely available from http://www.snp-nexus.org/IW-Scoring/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • DNA, Intergenic / genetics*
  • Databases, Nucleic Acid / statistics & numerical data
  • Genetic Variation*
  • Genome, Human
  • Genome-Wide Association Study / statistics & numerical data
  • Humans
  • Lymphoma, Follicular / genetics
  • Models, Genetic
  • Mutation
  • Neoplasms / genetics
  • Polymorphism, Single Nucleotide
  • Promoter Regions, Genetic
  • Regulatory Sequences, Nucleic Acid*
  • Telomerase / genetics
  • Whole Genome Sequencing / statistics & numerical data

Substances

  • DNA, Intergenic
  • TERT protein, human
  • Telomerase