Protein bioinformatics and mixtures of bivariate von Mises distributions for angular data

Biometrics. 2007 Jun;63(2):505-12. doi: 10.1111/j.1541-0420.2006.00682.x.

Abstract

A fundamental problem in bioinformatics is to characterize the secondary structure of a protein, which has traditionally been carried out by examining a scatterplot (Ramachandran plot) of the conformational angles. We examine two natural bivariate von Mises distributions--referred to as Sine and Cosine models--which have five parameters and, for concentrated data, tend to a bivariate normal distribution. These are analyzed and their main properties derived. Conditions on the parameters are established which result in bimodal behavior for the joint density and the marginal distribution, and we note an interesting situation in which the joint density is bimodal but the marginal distributions are unimodal. We carry out comparisons of the two models, and it is seen that the Cosine model may be preferred. Mixture distributions of the Cosine model are fitted to two representative protein datasets using the expectation maximization algorithm, which results in an objective partition of the scatterplot into a number of components. Our results are consistent with empirical observations; new insights are discussed.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Likelihood Functions
  • Malate Dehydrogenase / chemistry
  • Models, Statistical
  • Myoglobin / chemistry
  • Protein Conformation
  • Protein Structure, Secondary
  • Proteins / chemistry*

Substances

  • Myoglobin
  • Proteins
  • Malate Dehydrogenase