An entropy-based approach for the identification of phylogenetically informative genomic regions of Papillomavirus

Infect Genet Evol. 2011 Dec;11(8):2026-33. doi: 10.1016/j.meegid.2011.09.013. Epub 2011 Sep 23.

Abstract

The papillomaviruses form a highly diverse group that infect mammals, birds and reptiles. We know little about their genetic diversity and therefore the evolutionary mechanisms that drive the diversity of these viruses. Genomic sequences of papillomaviruses are highly divergent and so it is important to develop methods that select the most phylogenetic informative sites. This study aimed at making use of a novel approach based on entropy to select suitable genomic regions from which to infer the phylogeny of papillomavirus. Comparative genomic analyzes were performed to assess the genetic variability of each gene of Papillomaviridae family members. Regions with low entropy were selected to reconstruct papillomavirus phylogenetic trees based on four different methods. This methodology allowed us to identify regions that are conserved among papillomaviruses that infect different hosts. This is important because, despite the huge variation among all papillomaviruses genomes, we were able to find regions that are clearly shared among them, presenting low complexity levels of information from which phylogenetic predictions can be made. This approach allowed us to obtain robust topologies from relatively small datasets. The results indicate that the entropy approach can successfully select regions of the genome that are good markers from which to infer phylogenetic relationships, using less computational time, making the estimation of large phylogenies more accessible.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Biological Evolution
  • Databases, Genetic
  • Entropy*
  • Genetic Variation
  • Genome
  • Genomics / methods*
  • Molecular Sequence Data
  • Open Reading Frames
  • Papillomaviridae / classification*
  • Papillomaviridae / genetics*
  • Phylogeny
  • Sequence Alignment
  • Sequence Analysis, DNA
  • Viral Proteins / chemistry
  • Viral Proteins / genetics

Substances

  • Viral Proteins