A First Computational Frame for Recognizing Heparin-Binding Protein

Diagnostics (Basel). 2023 Jul 24;13(14):2465. doi: 10.3390/diagnostics13142465.

Abstract

Heparin-binding protein (HBP) is a cationic antibacterial protein derived from multinuclear neutrophils and an important biomarker of infectious diseases. The correct identification of HBP is of great significance to the study of infectious diseases. This work provides the first HBP recognition framework based on machine learning to accurately identify HBP. By using four sequence descriptors, HBP and non-HBP samples were represented by discrete numbers. By inputting these features into a support vector machine (SVM) and random forest (RF) algorithm and comparing the prediction performances of these methods on training data and independent test data, it is found that the SVM-based classifier has the greatest potential to identify HBP. The model could produce an auROC of 0.981 ± 0.028 on training data using 10-fold cross-validation and an overall accuracy of 95.0% on independent test data. As the first model for HBP recognition, it will provide some help for infectious diseases and stimulate further research in related fields.

Keywords: amino acid composition; composition/transition/distribution; dipeptide composition; dipeptide deviation from expected mean; heparin-binding protein; support vector machine.

Grants and funding

This work was funded by the National Nature Science Foundation of China (Grant Nos. 62250028, 61863010, 11926205, 11926412, and 61873076), National Key R&D Program of China (Grant No. 2020YFB2104400), and Natural Science Foundation of Hainan, China (Grant Nos. 121RC538, 119MS036, and 120RC588).