Use of 2D FFT and DTW in Protein Sequence Comparison

Protein J. 2024 Feb;43(1):1-11. doi: 10.1007/s10930-023-10160-2. Epub 2023 Oct 17.

Abstract

Protein sequence comparison remains a challenging work for the researchers owing to the computational complexity due to the presence of 20 amino acids compared with only four nucleotides in Genome sequences. Further, protein sequences of different species are of different lengths; it throws additional changes to the researchers to develop methods, specially alignment-free methods, to compare protein sequences. In this work, an efficient technique to compare protein sequences is developed by a graphical representation. First, the classified grouping of 20 amino acids with a cardinality of 4 based on polar class is considered to narrow down the representational range from 20 to 4. Then a unit vector technique based on a two-quadrant Cartesian system is proposed to provide a new two-dimensional graphical representation of the protein sequence. Now, two approaches are proposed to cope with the varying lengths of protein sequences from various species: one uses Dynamic Time Warping (DTW), while the other one uses a two-dimensional Fast Fourier Transform (2D FFT). Next, the effectiveness of these two techniques is analyzed using two evaluation criteria-quantitative measures based on symmetric distance (SD) and computational speed. An analysis is performed on five data sets of 9 ND4, 9 ND5, 9 ND6, 12 Baculovirus, and 24 TF proteins under the two methods. It is found that the FFT-based method produces the same results as DTW but in less computational time. It is found that the result of the proposed method agrees with the known biological reference. Further, the present method produces better clustering than the existing ones.

Keywords: 2 dimensional fast Fourier transform (2D FFT); Dynamic time warping (DTW); Graphical representation; Phylogenetic tree; Symmetric distance (SD); Two-quadrant Cartesian system.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Amino Acids*
  • Proteins* / chemistry
  • Proteins* / genetics

Substances

  • Proteins
  • Amino Acids