iT4SE-EP: Accurate Identification of Bacterial Type IV Secreted Effectors by Exploring Evolutionary Features from Two PSI-BLAST Profiles

Molecules. 2021 Apr 24;26(9):2487. doi: 10.3390/molecules26092487.

Abstract

Many gram-negative bacteria use type IV secretion systems to deliver effector molecules to a wide range of target cells. These substrate proteins, which are called type IV secreted effectors (T4SE), manipulate host cell processes during infection, often resulting in severe diseases or even death of the host. Therefore, identification of putative T4SEs has become a very active research topic in bioinformatics due to its vital roles in understanding host-pathogen interactions. PSI-BLAST profiles have been experimentally validated to provide important and discriminatory evolutionary information for various protein classification tasks. In the present study, an accurate computational predictor termed iT4SE-EP was developed for identifying T4SEs by extracting evolutionary features from the position-specific scoring matrix and the position-specific frequency matrix profiles. First, four types of encoding strategies were designed to transform protein sequences into fixed-length feature vectors based on the two profiles. Then, the feature selection technique based on the random forest algorithm was utilized to reduce redundant or irrelevant features without much loss of information. Finally, the optimal features were input into a support vector machine classifier to carry out the prediction of T4SEs. Our experimental results demonstrated that iT4SE-EP outperformed most of existing methods based on the independent dataset test.

Keywords: position-specific frequency matrix; position-specific scoring matrix; random forest; support vector machine; type IV secreted effectors.

MeSH terms

  • Amino Acid Sequence / genetics
  • Bacterial Infections / drug therapy
  • Bacterial Infections / genetics
  • Bacterial Infections / microbiology
  • Computational Biology
  • Evolution, Molecular*
  • Gram-Negative Bacteria / genetics*
  • Gram-Negative Bacteria / pathogenicity
  • Host-Pathogen Interactions / genetics*
  • Humans
  • Type IV Secretion Systems / chemistry
  • Type IV Secretion Systems / genetics*

Substances

  • Type IV Secretion Systems