Modelling splice sites with locality-sensitive sequence features

Int J Data Min Bioinform. 2013;7(1):78-102. doi: 10.1504/ijdmb.2013.050979.

Abstract

The splice sites are essential for pre-mRNA maturation and crucial for Splice Site Modelling (SSM); however, there are gaps between the splicing signals and the computationally identified sequence features. In this paper, the Locality Sensitive Features (LSFs) are proposed to reduce the gaps by homogenising their contexts. Under the skewness-kurtosis based statistics and data analysis, SSM attributed with LSFs is fulfilled by double-boundary outlier filters. The LSF-based SSM had been applied to six model organisms of diverse species; by the accuracy and Receiver Operating Characteristic (ROC) analysis, the promising results show the proposed methodology is versatile and robust for the splice-site classification. It is prospective the LSF-based SSM can serve as a new infrastructure for developing effective splice-site prediction methods and have the potential to be applied to other sequence prediction problems.

MeSH terms

  • Models, Theoretical
  • RNA Precursors / chemistry*
  • RNA Precursors / metabolism*
  • RNA Splice Sites*
  • RNA Splicing*
  • ROC Curve

Substances

  • RNA Precursors
  • RNA Splice Sites