Similarity/Dissimilarity analysis of protein sequences based on a new spectrum-like graphical representation

Evol Bioinform Online. 2014 Jun 12:10:87-96. doi: 10.4137/EBO.S14713. eCollection 2014.

Abstract

Sequence comparison is one of the foundations in bioinformatics, which can be used to study evolutionary relations among the sequences. In this study, a 2D spectrum-like graphical representation of protein sequences is presented based on the hydrophobicity scale of amino acids. The frequencies of amplitudes of 4-subsequences are adopted to characterize a spectrum-like graph, and a 17D vector is used as the descriptor of protein sequence. The χ(2) value of compatibility test is performed. New similarity analysis approach is illustrated on the all protein sequences, which are encoded by the mitochondrion genome of 20 different species. Finally, comparison with the ClustalW method shows the utility of our method.

Keywords: compatibility test; protein sequences; similarities/dissimilarities; spectral representation.