Bayesian nonparametrics in protein remote homology search

Bioinformatics. 2016 Sep 15;32(18):2744-52. doi: 10.1093/bioinformatics/btw213. Epub 2016 Apr 22.

Abstract

Motivation: Wide application of modeling of three-dimensional protein structures in biomedical research motivates developing protein sequence alignment computer tools featuring high alignment accuracy and sensitivity to remotely homologous proteins. In this paper, we aim at improving the quality of alignments between sequence profiles, encoded multiple sequence alignments. Modeling profile contexts, fixed-length profile fragments, is engaged to achieve this goal.

Results: We develop a hierarchical Dirichlet process mixture model to describe the distribution of profile contexts, which is able to capture dependencies between amino acids in each context position. The model represents an attempt at modeling profile fragments at several hierarchical levels, within the profile and among profiles. Even modeling unit-length contexts leads to greater improvements than processing 13-length contexts previously. We develop a new profile comparison method, called COMER, integrating the model. A benchmark with three other profile-to-profile comparison methods shows an increase in both sensitivity and alignment quality.

Availability and implementation: COMER is open-source software licensed under the GNU GPLv3, available at https://sourceforge.net/projects/comer

Contact: mindaugas.margelevicius@bti.vu.lt

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Bayes Theorem*
  • Models, Molecular
  • Proteins*
  • Sequence Alignment*
  • Sequence Analysis, Protein
  • Sequence Homology, Amino Acid*
  • Software

Substances

  • Proteins