A GHKNN model based on the physicochemical property extraction method to identify SNARE proteins

Front Genet. 2022 Nov 23:13:935717. doi: 10.3389/fgene.2022.935717. eCollection 2022.

Abstract

There is a great deal of importance to SNARE proteins, and their absence from function can lead to a variety of diseases. The SNARE protein is known as a membrane fusion protein, and it is crucial for mediating vesicle fusion. The identification of SNARE proteins must therefore be conducted with an accurate method. Through extensive experiments, we have developed a model based on graph-regularized k-local hyperplane distance nearest neighbor model (GHKNN) binary classification. In this, the model uses the physicochemical property extraction method to extract protein sequence features and the SMOTE method to upsample protein sequence features. The combination achieves the most accurate performance for identifying all protein sequences. Finally, we compare the model based on GHKNN binary classification with other classifiers and measure them using four different metrics: SN, SP, ACC, and MCC. In experiments, the model performs significantly better than other classifiers.

Keywords: GHKNN; SMOTE; SNARE proteins; identify protein sequences; physicochemical property extraction method.