Principal component analysis on a torus: Theory and application to protein dynamics

J Chem Phys. 2017 Dec 28;147(24):244101. doi: 10.1063/1.4998259.

Abstract

A dimensionality reduction method for high-dimensional circular data is developed, which is based on a principal component analysis (PCA) of data points on a torus. Adopting a geometrical view of PCA, various distance measures on a torus are introduced and the associated problem of projecting data onto the principal subspaces is discussed. The main idea is that the (periodicity-induced) projection error can be minimized by transforming the data such that the maximal gap of the sampling is shifted to the periodic boundary. In a second step, the covariance matrix and its eigendecomposition can be computed in a standard manner. Adopting molecular dynamics simulations of two well-established biomolecular systems (Aib9 and villin headpiece), the potential of the method to analyze the dynamics of backbone dihedral angles is demonstrated. The new approach allows for a robust and well-defined construction of metastable states and provides low-dimensional reaction coordinates that accurately describe the free energy landscape. Moreover, it offers a direct interpretation of covariances and principal components in terms of the angular variables. Apart from its application to PCA, the method of maximal gap shifting is general and can be applied to any other dimensionality reduction method for circular data.

MeSH terms

  • Microfilament Proteins / chemistry
  • Models, Chemical
  • Molecular Dynamics Simulation
  • Principal Component Analysis
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Thermodynamics

Substances

  • Microfilament Proteins
  • Proteins
  • villin