Learning virulent proteins from integrated query networks

BMC Bioinformatics. 2012 Dec 2:13:321. doi: 10.1186/1471-2105-13-321.

Abstract

Background: Methods of weakening and attenuating pathogens' abilities to infect and propagate in a host, thus allowing the natural immune system to more easily decimate invaders, have gained attention as alternatives to broad-spectrum targeting approaches. The following work describes a technique to identifying proteins involved in virulence by relying on latent information computationally gathered across biological repositories, applicable to both generic and specific virulence categories.

Results: A lightweight method for data integration is used, which links information regarding a protein via a path-based query graph. A method of weighting is then applied to query graphs that can serve as input to various statistical classification methods for discrimination, and the combined usage of both data integration and learning methods are tested against the problem of both generalized and specific virulence function prediction.

Conclusions: This approach improves coverage of functional data over a protein. Moreover, while depending largely on noisy and potentially non-curated data from public sources, we find it outperforms other techniques to identification of general virulence factors and baseline remote homology detection methods for specific virulence categories.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Data Interpretation, Statistical
  • Databases, Protein
  • Proteins / chemistry
  • Proteins / classification*
  • Sequence Analysis, Protein / methods*
  • Sequence Analysis, Protein / statistics & numerical data*
  • Virulence
  • Virulence Factors / chemistry
  • Virulence Factors / classification*

Substances

  • Proteins
  • Virulence Factors