Non-linear models based on simple topological indices to identify RNase III protein members

J Theor Biol. 2011 Mar 21;273(1):167-78. doi: 10.1016/j.jtbi.2010.12.019. Epub 2010 Dec 28.

Abstract

Alignment-free classifiers are especially useful in the functional classification of protein classes with variable homology and different domain structures. Thus, the Topological Indices to BioPolymers (TI2BioP) methodology (Agüero-Chapin et al., 2010) inspired in both the TOPS-MODE and the MARCH-INSIDE methodologies allows the calculation of simple topological indices (TIs) as alignment-free classifiers. These indices were derived from the clustering of the amino acids into four classes of hydrophobicity and polarity revealing higher sequence-order information beyond the amino acid composition level. The predictability power of such TIs was evaluated for the first time on the RNase III family, due to the high diversity of its members (primary sequence and domain organization). Three non-linear models were developed for RNase III class prediction: Decision Tree Model (DTM), Artificial Neural Networks (ANN)-model and Hidden Markov Model (HMM). The first two are alignment-free approaches, using TIs as input predictors. Their performances were compared with a non-classical HMM, modified according to our amino acid clustering strategy. The alignment-free models showed similar performances on the training and the test sets reaching values above 90% in the overall classification. The non-classical HMM showed the highest rate in the classification with values above 95% in training and 100% in test. Although the higher accuracy of the HMM, the DTM showed simplicity for the RNase III classification with low computational cost. Such simplicity was evaluated in respect to HMM and ANN models for the functional annotation of a new bacterial RNase III class member, isolated and annotated by our group.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Decision Trees
  • Enzyme Assays
  • Escherichia coli / enzymology
  • Markov Chains
  • Molecular Sequence Data
  • Neural Networks, Computer
  • Nonlinear Dynamics*
  • Protein Conformation
  • ROC Curve
  • Recombinant Proteins / chemistry
  • Recombinant Proteins / metabolism
  • Reproducibility of Results
  • Ribonuclease III / chemistry*
  • Ribonuclease III / isolation & purification
  • Sequence Alignment

Substances

  • Recombinant Proteins
  • Ribonuclease III

Associated data

  • GENBANK/GU190214