Discrimination of Thermophilic Proteins and Non-thermophilic Proteins Using Feature Dimension Reduction

Front Bioeng Biotechnol. 2020 Oct 22:8:584807. doi: 10.3389/fbioe.2020.584807. eCollection 2020.

Abstract

Thermophilicity is a very important property of proteins, as it sometimes determines denaturation and cell death. Thus, methods for predicting thermophilic proteins and non-thermophilic proteins are of interest and can contribute to the design and engineering of proteins. In this article, we describe the use of feature dimension reduction technology and LIBSVM to identify thermophilic proteins. The highest accuracy obtained by cross-validation was 96.02% with 119 parameters. When using only 16 features, we obtained an accuracy of 93.33%. We discuss the importance of the different characteristics in identification and report a comparison of the performance of support vector machine to that of other methods.

Keywords: amino acid; feature dimension reduction; feature selection; support vector machine; thermophilic proteins.