A survey of genomic properties for the detection of regulatory polymorphisms

PLoS Comput Biol. 2007 Jun;3(6):e106. doi: 10.1371/journal.pcbi.0030106. Epub 2007 Apr 25.

Abstract

Advances in the computational identification of functional noncoding polymorphisms will aid in cataloging novel determinants of health and identifying genetic variants that explain human evolution. To date, however, the development and evaluation of such techniques has been limited by the availability of known regulatory polymorphisms. We have attempted to address this by assembling, from the literature, a computationally tractable set of regulatory polymorphisms within the ORegAnno database (http://www.oreganno.org). We have further used 104 regulatory single-nucleotide polymorphisms from this set and 951 polymorphisms of unknown function, from 2-kb and 152-bp noncoding upstream regions of genes, to investigate the discriminatory potential of 23 properties related to gene regulation and population genetics. Among the most important properties detected in this region are distance to transcription start site, local repetitive content, sequence conservation, minor and derived allele frequencies, and presence of a CpG island. We further used the entire set of properties to evaluate their collective performance in detecting regulatory polymorphisms. Using a 10-fold cross-validation approach, we were able to achieve a sensitivity and specificity of 0.82 and 0.71, respectively, and we show that this performance is strongly influenced by the distance to the transcription start site.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Chromosome Mapping / methods*
  • DNA Mutational Analysis / methods*
  • Databases, Genetic*
  • Molecular Sequence Data
  • Polymorphism, Single Nucleotide / genetics*
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Sequence Analysis, DNA / methods*