Efficient prediction of alternative splice forms using protein domain homology

In Silico Biol. 2004;4(2):195-208.

Abstract

Alternative splicing can yield manifold different mature mRNAs from one precursor. New findings indicate that alternative splicing occurs much more often than previously assumed. A major goal of functional genomics lies in elucidating and characterizing the entire spectrum of alternative splice forms. Existing approaches such as EST-alignments focus only on the mRNA sequence to detect alternative splice forms. They do not consider function and characteristics of the resulting proteins. One important example of such functional characterization is homology to a known protein domain family. A powerful description of protein domains are profile Hidden Markov models (HMM) as stored in the Pfam database. In this paper we address the problem of identifying the splice form with the highest similarity to a protein domain family. Therefore, we take into consideration all possible splice forms. As demonstrated here for a number of genes, this homology based approach can be used successfully for predicting partial gene structures. Furthermore, we present some novel splice form predictions with high-scoring protein domain homology and point out that the detection of splice form specific protein domains helps to answer questions concerning hereditary diseases. Simple approaches based on a BLASTP search cannot be applied here, since the number of possible splice forms increases exponentially with the number of exons. To this end, we have developed an efficient polynomial-time algorithm, called ASFPred (Alternative Splice Form Prediction). This algorithm needs only a set of exons as input.

MeSH terms

  • Algorithms
  • Alternative Splicing*
  • Codon
  • Computational Biology / methods*
  • Databases as Topic
  • Databases, Protein
  • Exons
  • Expressed Sequence Tags
  • Genomics
  • Humans
  • Markov Chains
  • Models, Genetic*
  • Models, Statistical
  • Protein Structure, Tertiary
  • RNA, Messenger / metabolism
  • Software

Substances

  • Codon
  • RNA, Messenger