Resolving the multiple sequence alignment problem using biogeography-based optimization with multiple populations

J Bioinform Comput Biol. 2015 Aug;13(4):1550016. doi: 10.1142/S021972001550016X. Epub 2015 Apr 30.

Abstract

The multiple sequence alignment (MSA) is one of the most challenging problems in bioinformatics, it involves discovering similarity between a set of protein or DNA sequences. This paper introduces a new method for the MSA problem called biogeography-based optimization with multiple populations (BBOMP). It is based on a recent metaheuristic inspired from the mathematics of biogeography named biogeography-based optimization (BBO). To improve the exploration ability of BBO, we have introduced a new concept allowing better exploration of the search space. It consists of manipulating multiple populations having each one its own parameters. These parameters are used to build up progressive alignments allowing more diversity. At each iteration, the best found solution is injected in each population. Moreover, to improve solution quality, six operators are defined. These operators are selected with a dynamic probability which changes according to the operators efficiency. In order to test proposed approach performance, we have considered a set of datasets from Balibase 2.0 and compared it with many recent algorithms such as GAPAM, MSA-GA, QEAMSA and RBT-GA. The results show that the proposed approach achieves better average score than the previously cited methods.

Keywords: Multiple sequence alignment (MSA); biogeography based optimization (BBO); guide tree; iterative alignment; progressive alignment.

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Databases, Factual
  • Genetic Variation
  • Geography
  • Mutation
  • Probability
  • Proteins / genetics
  • Sequence Alignment / methods*

Substances

  • Proteins