Principal component analysis in protein tertiary structure prediction

J Bioinform Comput Biol. 2018 Apr;16(2):1850005. doi: 10.1142/S0219720018500051. Epub 2018 Feb 22.

Abstract

We discuss applicability of principal component analysis (PCA) for protein tertiary structure prediction from amino acid sequence. The algorithm presented in this paper belongs to the category of protein refinement models and involves establishing a low-dimensional space where the sampling (and optimization) is carried out via particle swarm optimizer (PSO). The reduced space is found via PCA performed for a set of low-energy protein models previously found using different optimization techniques. A high frequency term is added into this expansion by projecting the best decoy into the PCA basis set and calculating the residual model. This term is aimed at providing high frequency details in the energy optimization. The goal of this research is to analyze how the dimensionality reduction affects the prediction capability of the PSO procedure. For that purpose, different proteins from the Critical Assessment of Techniques for Protein Structure Prediction experiments were modeled. In all the cases, both the energy of the best decoy and the distance to the native structure have decreased. Our analysis also shows how the predicted backbone structure of native conformation and of alternative low energy states varies with respect to the PCA dimensionality. Generally speaking, the reconstruction can be successfully achieved with 10 principal components and the high frequency term. We also provide a computational analysis of protein energy landscape for the inverse problem of reconstructing structure from the reduced number of principal components, showing that the dimensionality reduction alleviates the ill-posed character of this high-dimensional energy optimization problem. The procedure explained in this paper is very fast and allows testing different PCA expansions. Our results show that PSO improves the energy of the best decoy used in the PCA when the adequate number of PCA terms is considered.

Keywords: Principal component analysis; conformational sampling; particle swarm optimization; protein structure refinement; tertiary protein structure.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computational Biology / methods*
  • Models, Molecular*
  • Principal Component Analysis*
  • Protein Structure, Tertiary*
  • Proteins / chemistry
  • Proteins / metabolism
  • Uracil-DNA Glycosidase / chemistry
  • Uracil-DNA Glycosidase / metabolism

Substances

  • Proteins
  • Uracil-DNA Glycosidase