Prediction of high-risk types of human papillomaviruses using statistical model of protein "sequence space"

Comput Math Methods Med. 2015:2015:756345. doi: 10.1155/2015/756345. Epub 2015 Apr 20.

Abstract

Discrimination of high-risk types of human papillomaviruses plays an important role in the diagnosis and remedy of cervical cancer. Recently, several computational methods have been proposed based on protein sequence-based and structure-based information, but the information of their related proteins has not been used until now. In this paper, we proposed using protein "sequence space" to explore this information and used it to predict high-risk types of HPVs. The proposed method was tested on 68 samples with known HPV types and 4 samples without HPV types and further compared with the available approaches. The results show that the proposed method achieved the best performance among all the evaluated methods with accuracy 95.59% and F1-score 90.91%, which indicates that protein "sequence space" could potentially be used to improve prediction of high-risk types of HPVs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Capsid Proteins / chemistry
  • Computational Biology / methods*
  • Databases, Factual
  • Female
  • Humans
  • Models, Statistical
  • Molecular Sequence Data
  • Mutation
  • Open Reading Frames
  • Papillomaviridae / genetics*
  • Papillomavirus Infections / diagnosis
  • Papillomavirus Infections / virology*
  • Reproducibility of Results
  • Risk
  • Software
  • Uterine Cervical Neoplasms / virology*
  • Viral Proteins / chemistry

Substances

  • Capsid Proteins
  • Viral Proteins