Human polyomaviruses identification by logic mining techniques

Virol J. 2012 Mar 2:9:58. doi: 10.1186/1743-422X-9-58.

Abstract

Background: Differences in genomic sequences are crucial for the classification of viruses into different species. In this work, viral DNA sequences belonging to the human polyomaviruses BKPyV, JCPyV, KIPyV, WUPyV, and MCPyV are analyzed using a logic data mining method in order to identify the nucleotides which are able to distinguish the five different human polyomaviruses.

Results: The approach presented in this work is successful as it discovers several logic rules that effectively characterize the different five studied polyomaviruses. The individuated logic rules are able to separate precisely one viral type from the other and to assign an unknown DNA sequence to one of the five analyzed polyomaviruses.

Conclusions: The data mining analysis is performed by considering the complete sequences of the viruses and the sequences of the different gene regions separately, obtaining in both cases extremely high correct recognition rates.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Computational Biology / methods*
  • DNA, Viral / chemistry*
  • Data Mining*
  • Humans
  • Polyomavirus / classification*
  • Polyomavirus / genetics*

Substances

  • DNA, Viral