Virulence factor prediction in Streptococcus pyogenes using classification and clustering based on microarray data

Appl Microbiol Biotechnol. 2012 Mar;93(5):2091-8. doi: 10.1007/s00253-012-3917-3. Epub 2012 Feb 4.

Abstract

Interesting biological information as, for example, gene expression data (microarrays), can be extracted from publicly available genomic data. As a starting point in order to narrow down the great possibilities of wet lab experiments, global high throughput data and available knowledge should be used to infer biological knowledge and emit biological hypothesis. Here, based on microarray data, we propose the use of cluster and classification methods that have become very popular and are implemented in freely available software in order to predict the participation in virulence mechanisms of different proteins coded by genes of the pathogen Streptococcus pyogenes. Confidence of predictions is based on classification errors of known genes and repetitive prediction by more than three methods. A special emphasis is done on the nonlinear kernel classification methods used. We propose a list of interesting candidates that could be virulence factors or that participate in the virulence process of S. pyogenes. Biological validations should start using this list of candidates as they show similar behavior to known virulence factors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacterial Proteins / genetics
  • Cluster Analysis
  • Computational Biology / methods*
  • Microarray Analysis
  • Streptococcus pyogenes / classification
  • Streptococcus pyogenes / genetics*
  • Streptococcus pyogenes / pathogenicity*
  • Transcriptome*
  • Virulence Factors / genetics*

Substances

  • Bacterial Proteins
  • Virulence Factors