Detection of atypical genes in virus families using a one-class SVM

BMC Genomics. 2014 Oct 20;15(1):913. doi: 10.1186/1471-2164-15-913.

Abstract

Background: The diversity of viruses, the absence of universally common genes in them, and their ability to act as carriers of genetic material make assessment of evolutionary paths of viral genes very difficult. One important factor contributing to this complexity is horizontal gene transfer.

Results: We explore the possibility for the systematic identification of atypical genes within virus families, including viruses whose genome is not encoded by a double-stranded DNA. Our method is based on gene statistical features that differ in genes that were subject of recent horizontal gene transfer from those of the genome in which they are observed. We employ a one-class SVM approach to detect atypical genes within a virus family basing of their statistical signatures and without explicit knowledge of the source species. The simplicity of the statistical features used makes the method applicable to various viruses irrespective of their genome size or type.

Conclusions: On simulated data, the method can robustly identify alien genes irrespective of the coding nucleic acid found in a virus. It also compares well to results obtained in related studies for double-stranded DNA viruses. Its value in practice is confirmed by the identification of isolated examples of horizontal gene transfer events that have already been described in the literature. A Python package implementing the method and the results for the analyzed virus families are available at http://svm-agp.bioinf.mpi-inf.mpg.de.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Transfer, Horizontal
  • Genes, Viral*
  • Genome Size
  • Models, Genetic
  • Models, Statistical
  • Support Vector Machine*
  • Viruses / classification
  • Viruses / genetics*