TI2BioP: Topological Indices to BioPolymers. Its practical use to unravel cryptic bacteriocin-like domains

Amino Acids. 2011 Feb;40(2):431-42. doi: 10.1007/s00726-010-0653-9. Epub 2010 Jun 19.

Abstract

Bacteriocins are proteinaceous toxins produced and exported by both gram-negative and gram-positive bacteria as a defense mechanism. The bacteriocin protein family is highly diverse, which complicates the identification of bacteriocin-like sequences using alignment approaches. The use of topological indices (TIs) irrespective of sequence similarity can be a promising alternative to predict proteinaceous bacteriocins. Thus, we present Topological Indices to BioPolymers (TI2BioP) as an alignment-free approach inspired in both the Topological Substructural Molecular Design (TOPS-MODE) and Markov Chain Invariants for Network Selection and Design (MARCH-INSIDE) methodology. TI2BioP allows the calculation of the spectral moments as simple TIs to seek quantitative sequence-function relationships (QSFR) models. Since hydrophobicity and basicity are major criteria for the bactericide activity of bacteriocins, the spectral moments ((HP)μ(k)) were derived for the first time from protein artificial secondary structures based on amino acid clustering into a Cartesian system of hydrophobicity and polarity. Several orders of (HP)μ(k) characterized numerically 196 bacteriocin-like sequences and a control group made up of 200 representative CATH domains. Subsequently, they were used to develop an alignment-free QSFR model allowing a 76.92% discrimination of bacteriocin proteins from other domains, a relevant result considering the high sequence diversity among the members of both groups. The model showed a prediction overall performance of 72.16%, detecting specifically 66.7% of proteinaceous bacteriocins whereas the InterProScan retrieved just 60.2%. As a practical validation, the model also predicted successfully the cryptic bactericide function of the Cry 1Ab C-terminal domain from Bacillus thuringiensis's endotoxin, which has not been detected by classical alignment methods.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Bacteriocins / chemistry*
  • Biopolymers / chemistry*
  • Computational Biology
  • Hydrophobic and Hydrophilic Interactions
  • Molecular Sequence Data
  • Protein Structure, Tertiary
  • Sequence Alignment

Substances

  • Bacteriocins
  • Biopolymers