Use of mutual information arrays to predict coevolving sites in the full length HIV gp120 protein for subtypes B and C

Virol Sin. 2011 Apr;26(2):95-104. doi: 10.1007/s12250-011-3188-7. Epub 2011 Apr 7.

Abstract

It is well established that different sites within a protein evolve at different rates according to their role within the protein; identification of these correlated mutations can aid in tasks such as ab initio protein structure, structure function analysis or sequence alignment. Mutual Information is a standard measure for coevolution between two sites but its application is limited by signal to noise ratio. In this work we report a preliminary study to investigate whether larger sequence sets could circumvent this problem by calculating mutual information arrays for two sets of drug naïve sequences from the HIV gp120 protein for the B and C subtypes. Our results suggest that while the larger sequences sets can improve the signal to noise ratio, the gain is offset by the high mutation rate of the HIV virus which makes it more difficult to achieve consistent alignments. Nevertheless, we were able to predict a number of coevolving sites that were supported by previous experimental studies as well as a region close to the C terminal of the protein that was highly variable in the C subtype but highly conserved in the B subtype.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Base Sequence
  • Computational Biology / methods*
  • Evolution, Molecular*
  • HIV Envelope Protein gp120 / chemistry*
  • HIV Envelope Protein gp120 / genetics*
  • HIV-1 / chemistry
  • HIV-1 / classification
  • HIV-1 / genetics*
  • Molecular Conformation
  • Molecular Sequence Data
  • Mutation
  • Phylogeny
  • Sequence Alignment

Substances

  • HIV Envelope Protein gp120