MMV method: a new approach to compare protein sequences under binary representation

J Biomol Struct Dyn. 2024 Feb 20:1-7. doi: 10.1080/07391102.2024.2317982. Online ahead of print.

Abstract

In the present work, a new form of descriptor using minimal moment vector (MMV) is introduced to compare protein sequences in the frequency domain under their component wise binary representations. From every sequence, 20 different binary component sequences are formed, each corresponding to 20 amino acids. Each such vector is now shifted from the time domain to the frequency domain by applying the Fast Fourier Transform (FFT). Next, the power spectrum calculated from the FFT values for each component sequence is so normalized that the sum of the components equals 1. The descriptor is defined as a 20-component vector composed of the 20 second-order minimal moments calculated from the normalized spectrum of the 20 component sequences. Once the descriptor is known, the distance matrix is created by applying the Euclidean Distance measure. The phylogenetic tree is generated by applying the unweighted pair group method with the arithmetic mean (UPGMA) algorithm using Molecular Evolutionary Genetics Analysis11 (MEGA11) software. In this work, the datasets used for similarity studies are 9 NADH dehydrogenase 5 (ND5), 12 Baculoviruses, 24 Transferrins (TF) proteins, and 50 Spike Protein of coronavirus. A qualitative measure using rationalized perception is used to compare the effectiveness of the proposed method. Quantitative measure based on symmetric distance (SD) is used to compare the phylogenetic trees of the present method with those obtained by other methods. It is observed that the phylogenetic trees generated by the proposed technique are at par with their known biological references, and they produce results better than those of the earlier methods.Communicated by Ramaswamy H. Sarma.

Keywords: Euclidean distance; Minimal moment vector; UPGMA algorithm; fast Fourier transform; phylogenetic tree; power spectrum.