Computational comparative study of tuberculosis proteomes using a model learned from signal peptide structures

PLoS One. 2012;7(4):e35018. doi: 10.1371/journal.pone.0035018. Epub 2012 Apr 9.

Abstract

Secretome analysis is important in pathogen studies. A fundamental and convenient way to identify secreted proteins is to first predict signal peptides, which are essential for protein secretion. However, signal peptides are highly complex functional sequences that are easily confused with transmembrane domains. Such confusion would obviously affect the discovery of secreted proteins. Transmembrane proteins are important drug targets, but very few transmembrane protein structures have been determined experimentally; hence, prediction of the structures is essential. In the field of structure prediction, researchers do not make assumptions about organisms, so there is a need for a general signal peptide predictor.To improve signal peptide prediction without prior knowledge of the associated organisms, we present a machine-learning method, called SVMSignal, which uses biochemical properties as features, as well as features acquired from a novel encoding, to capture biochemical profile patterns for learning the structures of signal peptides directly.We tested SVMSignal and five popular methods on two benchmark datasets from the SPdb and UniProt/Swiss-Prot databases, respectively. Although SVMSignal was trained on an old dataset, it performed well, and the results demonstrate that learning the structures of signal peptides directly is a promising approach. We also utilized SVMSignal to analyze proteomes in the entire HAMAP microbial database. Finally, we conducted a comparative study of secretome analysis on seven tuberculosis-related strains selected from the HAMAP database. We identified ten potential secreted proteins, two of which are drug resistant and four are potential transmembrane proteins.SVMSignal is publicly available at http://bio-cluster.iis.sinica.edu.tw/SVMSignal. It provides user-friendly interfaces and visualizations, and the prediction results are available for download.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • Bacterial Proteins / analysis*
  • Bacterial Proteins / chemistry
  • Databases, Protein
  • Humans
  • Mycobacterium tuberculosis / metabolism*
  • Protein Conformation
  • Protein Sorting Signals*
  • Proteome / analysis*
  • Tuberculosis / microbiology*

Substances

  • Bacterial Proteins
  • Protein Sorting Signals
  • Proteome