Intron identification approaches based on weighted features and fuzzy decision trees

Comput Biol Med. 2012 Jan;42(1):112-22. doi: 10.1016/j.compbiomed.2011.10.015. Epub 2011 Nov 20.

Abstract

Current computational predictions of splice sites largely depend on the sequence patterns of known intronic sequence features (ISFs) described in the classical intron definition model (IDM). The computation-oriented IDM (CO-IDM) clearly provides more specific and concrete information for describing intron flanks of splice sites (IFSSs). In the paper, we proposed a novel approach of fuzzy decision trees (FDTs) which utilize (1) weighted ISFs of twelve uni-frame patterns (UFPs) and forty-five multi-frame patterns (MFPs) and (2) gain ratios to improve the performances in identifying an intron. First, we fuzzified extracted features from genomic sequences using membership functions with an unsupervised self-organizing map (SOM) technique. Then, we brought in different viewpoints of globally weighting and crossly referring in generating fuzzy rules, which are interpretable and useful for biologists to verify whether a sequence is an intron or not. Finally, the experimental results revealed the effectiveness of the proposed method in improving the identification accuracy. Besides, we also implemented an on-line intronic identifier to infer an unknown genomic sequence.

MeSH terms

  • Computational Biology
  • Decision Trees*
  • Fuzzy Logic*
  • Humans
  • Introns*
  • Models, Genetic*