Construction of non-symmetric substitution matrices derived from proteomes with biased amino acid distributions

C R Biol. 2005 May;328(5):445-53. doi: 10.1016/j.crvi.2005.02.002.

Abstract

Automatic comparison of compositionally biased genomes, such as that of the malarial causative agent Plasmodium falciparum (82% adenosine + thymidine), with genomes of average composition, is currently limited. Indeed, popular tools such as BLAST require that amino acid distributions be similar in aligned sequences. However, the P. falciparum genome is so biased that six amino acids account for more than 50% of the protein composition. One reason for the comparison methods failure lies in the compositional difference between the query and the subject proteomes, which is not taken into account in the amino acid substitution matrices. This paper introduces a method to derive substitution matrices, in particular BLOSUM 62, in the frame of the information theory. It allows the construction of non-symmetrical matrices, taking into account the non-symmetric amino acid distributions. The dirAtPf family of matrices allowing the comparison of P. falciparum and A. thaliana is given as an example. This paper further provides an analysis of the obtained matrices in the frame of the information theory, supporting the discrimination advantage they bring.

MeSH terms

  • Amino Acid Sequence
  • Amino Acid Substitution
  • Animals
  • Genome, Protozoan
  • Models, Genetic
  • Plasmodium falciparum / genetics*
  • Proteome*

Substances

  • Proteome