Construction of protein dendrograms based on amino acid indices and Discrete Fourier Transform

Annu Int Conf IEEE Eng Med Biol Soc. 2014:2014:816-9. doi: 10.1109/EMBC.2014.6943716.

Abstract

From the literature, existing methods use pairwise percent identity to identify the percentage of similarity between two protein sequences, in order to create a dendrogram. As this is a parametric method of measuring the similarities between proteins, and different parameter may yield different results, this method does not guarantee that the global optimal similarity values will be found. As protein dendrogram construction is used in other areas, such as multiple protein sequence alignments, it is very important that the most related protein sequences to be identified and align first. Furthermore, by using the pairwise percent identity of the protein sequences to construct the dendrograms, the physical characteristics of protein sequences and amino acids are not considered. In this paper, a new method was proposed for constructing protein sequence dendrograms. For this method, Discrete Fourier Transform, was used to construct the distance matrix in combination with the multiple amino acid indices that were used to encode protein sequences into numerical sequences. In order to show the applicability and robustness of the proposed method, a case study was presented by using nine Cluster of Differentiation 4 protein sequences extracted from the UniProt online database.

MeSH terms

  • Algorithms
  • Amino Acid Sequence*
  • Amino Acids / chemistry
  • Computational Biology / methods*
  • Fourier Analysis*
  • Proteins / chemistry*
  • Proteins / classification*
  • Sequence Alignment / methods*

Substances

  • Amino Acids
  • Proteins