Exploring alternative transcript structure in the human genome using blocks and InterPro

J Bioinform Comput Biol. 2003 Jul;1(2):289-306. doi: 10.1142/s0219720003000113.

Abstract

Understanding how alternative splicing affects gene function is an important challenge facing modern-day molecular biology. Using homology-based, protein sequence analysis methods, it should be possible to investigate how transcript diversity impacts protein function. To test this, high-quality exon-intron structures were deduced for over 8000 human genes, including over 1300 (17 percent) that produce multiple transcript variants. A data mining technique (DiffMotif) was developed to identify genes in which transcript variation coincides with changes in conserved motifs between variants. Applying this method, we found that 30 percent of the multi-variant genes in our test set exhibited a differential profile of conserved InterPro and/or BLOCKS motifs across different mRNA variants. To investigate these, a visualization tool (ProtAnnot) that displays amino acid motifs in the context of genomic sequence was developed. Using this tool, genes revealed by the DiffMotif method were analyzed, and when possible, hypotheses regarding the potential role of alternative transcript structure in modulating gene function were developed. Examples of these, including: MEOX1, a homeobox-containing protein; AIRE, involved in auto-immune disease; PLAT, tissue type plasminogen activator; and CD79b, a component of the B-cell receptor complex, are presented. These results demonstrate that amino acid motif databases like BLOCKS and InterPro are useful tools for investigating how alternative transcript structure affects gene function.

MeSH terms

  • Algorithms
  • Alternative Splicing / genetics*
  • Amino Acid Motifs / genetics
  • Chromosome Mapping / methods*
  • Conserved Sequence
  • Databases, Protein*
  • Gene Expression Regulation / genetics
  • Genetic Variation
  • Genome, Human*
  • Humans
  • Proteins / chemistry
  • Proteins / genetics
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Structure-Activity Relationship
  • Transcription Factors / genetics*

Substances

  • Proteins
  • Transcription Factors