Corpus based learning of stochastic context-free grammar combined with hidden Markov models for tRNA modelling

Juan Miguel Garcia-Gomez; Jose Miguel Benedi

doi:10.1109/IEMBS.2004.1403796

Corpus based learning of stochastic context-free grammar combined with hidden Markov models for tRNA modelling

Conf Proc IEEE Eng Med Biol Soc. 2004:2004:2785-8. doi: 10.1109/IEMBS.2004.1403796.

Authors

Juan Miguel Garcia-Gomez¹, Jose Miguel Benedi

Affiliation

¹ Informatica Medica-BET, Politecnico de Valencia, Spain.

PMID: 17270855
DOI: 10.1109/IEMBS.2004.1403796

Abstract

tRNA molecule has a well-known second structure in which it folds by pairing of far-off nucleotides. This paper shows a syntactic pattern recognition methodology for model tRNA second structure using stochastic context-free grammars. In order to learn models, structural regions (paired nucleotides) have been learned from categorized samples with full labelled tree with a Corpus based estimation algorithm. Nonstructural regions have been modelled by hidden Markov models and transformed to stochastic regular grammars to fusion together the structural regions. Test with positive samples and negative samples in comparison with Sakakibara achieved 1.81% in sequences error rate, 98.43% in precision and 100% in recall and 100% of SER in negative test. Corpus based algorithm is computational time efficient and required less training samples for converge to the correct model of the tRNA second structure.