FAME: fast and memory efficient multiple sequences alignment tool through compatible chain of roots

Etminan Naznooshsadat; Parvinnia Elham; Sharifi-Zarchi Ali

doi:10.1093/bioinformatics/btaa175

FAME: fast and memory efficient multiple sequences alignment tool through compatible chain of roots

Bioinformatics. 2020 Jun 1;36(12):3662-3668. doi: 10.1093/bioinformatics/btaa175.

Authors

Etminan Naznooshsadat¹, Parvinnia Elham¹, Sharifi-Zarchi Ali²

Affiliations

¹ Department of Computer Engineering, Shiraz Branch, Islamic Azad University, Shiraz, Iran.
² Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.

PMID: 32170927
DOI: 10.1093/bioinformatics/btaa175

Abstract

Motivation: Multiple sequence alignment (MSA) is important and challenging problem of computational biology. Most of the existing methods can only provide a short length multiple alignments in an acceptable time. Nevertheless, when the researchers confront the genome size in the multiple alignments, the process has required a huge processing space/time. Accordingly, using the method that can align genome size rapidly and precisely has a great effect, especially on the analysis of the very long alignments. Herein, we have proposed an efficient method, called FAME, which vertically divides sequences from the places that they have common areas; then they are arranged in consecutive order. Then these common areas are shifted and placed under each other, and the subsequences between them are aligned using any existing MSA tool.

Results: The results demonstrate that the combination of FAME and the MSA methods and deploying minimizer are capable to be executed on personal computer and finely align long length sequences with much higher sum-of-pair (SP) score compared to the standalone MSA tools. As we select genomic datasets with longer length, the SP score of the combinatorial methods is gradually improved. The calculated computational complexity of methods supports the results in a way that combining FAME and the MSA tools leads to at least four times faster execution on the datasets.

Availability and implementation: The source code and all datasets and run-parameters are accessible free on http://github.com/naznoosh/msa.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

Algorithms*
Computational Biology
Genome
Sequence Alignment
Software*