An OpenMP-based tool for finding longest common subsequence in bioinformatics

BMC Res Notes. 2019 Apr 11;12(1):220. doi: 10.1186/s13104-019-4256-6.

Abstract

Objective: Finding the longest common subsequence (LCS) among sequences is NP-hard. This is an important problem in bioinformatics for DNA sequence alignment and pattern discovery. In this research, we propose new CPU-based parallel implementations that can provide significant advantages in terms of execution times, monetary cost, and pervasiveness in finding LCS of DNA sequences in an environment where Graphics Processing Units are not available. For general purpose use, we also make the OpenMP-based tool publicly available to end users.

Result: In this study, we develop three novel parallel versions of the LCS algorithm on: (i) distributed memory machine using message passing interface (MPI); (ii) shared memory machine using OpenMP, and (iii) hybrid platform that utilizes both distributed and shared memory using MPI-OpenMP. The experimental results with both simulated and real DNA sequence data show that the shared memory OpenMP implementation provides at least two-times absolute speedup than the best sequential version of the algorithm and a relative speedup of almost 7. We provide a detailed comparison of the execution times among the implementations on different platforms with different versions of the algorithm. We also show that removing branch conditions negatively affects the performance of the CPU-based parallel algorithm on OpenMP platform.

Keywords: DNA sequence alignment; LCS on MPI and OpenMP; Longest common subsequence (LCS); Parallel algorithms for LCS; Tool for finding LCS.

MeSH terms

  • Algorithms*
  • Animals
  • Base Sequence
  • Bees / genetics
  • Computational Biology / methods*
  • Humans
  • Sequence Alignment
  • Sequence Analysis, DNA / statistics & numerical data*
  • Sequence Homology, Nucleic Acid
  • Software*
  • Strigiformes / genetics
  • Viruses / genetics