Corpus based learning of stochastic context-free grammar combined with hidden Markov models for tRNA modelling

Conf Proc IEEE Eng Med Biol Soc. 2004:2004:2785-8. doi: 10.1109/IEMBS.2004.1403796.

Abstract

tRNA molecule has a well-known second structure in which it folds by pairing of far-off nucleotides. This paper shows a syntactic pattern recognition methodology for model tRNA second structure using stochastic context-free grammars. In order to learn models, structural regions (paired nucleotides) have been learned from categorized samples with full labelled tree with a Corpus based estimation algorithm. Nonstructural regions have been modelled by hidden Markov models and transformed to stochastic regular grammars to fusion together the structural regions. Test with positive samples and negative samples in comparison with Sakakibara achieved 1.81% in sequences error rate, 98.43% in precision and 100% in recall and 100% of SER in negative test. Corpus based algorithm is computational time efficient and required less training samples for converge to the correct model of the tRNA second structure.