A novel method for splice sites prediction using sequence component and hidden Markov model

Annu Int Conf IEEE Eng Med Biol Soc. 2016 Aug:2016:3076-3079. doi: 10.1109/EMBC.2016.7591379.

Abstract

With increasing growth of DNA sequence data, it has become an urgent demand to develop new methods to accurately predict the genes. The performance of gene detection methods mainly depend on the efficiency of splice site prediction methods. In this paper, a novel method for detecting splice sites is proposed by using a new effective DNA encoding method and AdaBoost.M1 classifier. Our proposed DNA encoding method is based on multi-scale component (MSC) and first order Markov model (MM1). It has been applied to the HS3D dataset with repeated 10 fold cross validation. The experimental results indicate that the new method has increased the classification accuracy and outperformed some current methods such as MM1-SVM, Reduced MM1-SVM, SVM-B, LVMM, DM-SVM, DM2-AdaBoost and MS C+Pos(+APR)-SVM.

MeSH terms

  • Base Sequence
  • Computational Biology / methods*
  • Markov Chains*
  • RNA Splice Sites / genetics*
  • Support Vector Machine

Substances

  • RNA Splice Sites