In silico discrimination of single nucleotide polymorphisms and pathological mutations in human gene promoter regions by means of local DNA sequence context and regularity

In Silico Biol. 2006;6(1-2):23-34.

Abstract

DNA sequence features were sought that could be used for the in silico ascertainment of the likely functional consequences of single nucleotide changes in human gene promoter regions. To identify relevant features of the local DNA sequence context, we transformed into consensus tables the nucleotide composition of sequences flanking 101 promoter SNPs of type C<-->T or A<-->G, defined empirically as being either 'functional' or 'non-functional' on the basis of a standardised reporter gene assay. The similarity of a given sequence to these consensus tables was then measured by means of the Shapiro-Senapathy score. A decision rule with the potential to discriminate between empirically ascertained functional and non-functional SNPs was proposed that potentiated discrimination between functional and non-functional SNPs with a sensitivity of 80% and a specificity of 20%. Two further datasets (viz. disease-associated SNPs of types A<-->G and C<-->T (N = 75) and pathological promoter mutations (transitions, N = 114)) were retrieved from the Human Gene Mutation Database (HGMD; http://www.hgmd.org/) and analyzed using consensus tables derived from the functional and non-functional promoter SNPs; approximately 70% were correctly recognized as being of probable functional significance. Complexity analysis was also used to quantify the regularity of the local DNA sequence environment. Functional SNPs/mutations of type C<-->T were found to occur in DNA regions characterized by lower average sequence complexity as measured with respect to symmetric elements; complexity values increased gradually from functional SNPs and pathological mutations to functional disease-associated SNPs and non-functional SNPs. This may reflect the internal axial symmetry that frequently characterizes transcription factor binding sites.

MeSH terms

  • Algorithms*
  • Base Sequence
  • Binding Sites
  • Databases, Nucleic Acid
  • Genetic Predisposition to Disease*
  • Genome, Human
  • Humans
  • Mutation*
  • Polymorphism, Single Nucleotide*
  • Promoter Regions, Genetic*
  • Sequence Analysis, DNA / methods*
  • Transcription Factors / genetics

Substances

  • Transcription Factors