Improving the accuracy of predicting disulfide connectivity by feature selection

J Comput Chem. 2010 May;31(7):1478-85. doi: 10.1002/jcc.21433.

Abstract

Disulfide bonds are primary covalent cross-links formed between two cysteine residues in the same or different protein polypeptide chains, which play important roles in the folding and stability of proteins. However, computational prediction of disulfide connectivity directly from protein primary sequences is challenging due to the nonlocal nature of disulfide bonds in the context of sequences, and the number of possible disulfide patterns grows exponentially when the number of cysteine residues increases. In the previous studies, disulfide connectivity prediction was usually performed in high-dimensional feature space, which can cause a variety of problems in statistical learning, such as the dimension disaster, overfitting, and feature redundancy. In this study, we propose an efficient feature selection technique for analyzing the importance of each feature component. On the basis of this approach, we selected the most important features for predicting the connectivity pattern of intra-chain disulfide bonds. Our results have shown that the high-dimensional features contain redundant information, and the prediction performance can be further improved when these high-dimensional features are reduced to a lower but more compact dimensional space. Our results also indicate that the global protein features contribute little to the formation and prediction of disulfide bonds, while the local sequential and structural information play important roles. All these findings provide important insights for structural studies of disulfide-rich proteins.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Cysteine / chemistry
  • Cysteine / metabolism
  • Disulfides / chemistry*
  • Disulfides / metabolism
  • Predictive Value of Tests
  • Protein Binding
  • Protein Conformation
  • Protein Folding
  • Proteins / chemistry*
  • Proteins / metabolism

Substances

  • Disulfides
  • Proteins
  • Cysteine