GSAlign: an efficient sequence alignment tool for intra-species genomes

BMC Genomics. 2020 Feb 24;21(1):182. doi: 10.1186/s12864-020-6569-1.

Abstract

Background: Personal genomics and comparative genomics are becoming more important in clinical practice and genome research. Both fields require sequence alignment to discover sequence conservation and variation. Though many methods have been developed, some are designed for small genome comparison while some are not efficient for large genome comparison. Moreover, most existing genome comparison tools have not been evaluated the correctness of sequence alignments systematically. A wrong sequence alignment would produce false sequence variants.

Results: In this study, we present GSAlign that handles large genome sequence alignment efficiently and identifies sequence variants from the alignment result. GSAlign is an efficient sequence alignment tool for intra-species genomes. It identifies sequence variations from the sequence alignments. We estimate performance by measuring the correctness of predicted sequence variations. The experiment results demonstrated that GSAlign is not only faster than most existing state-of-the-art methods, but also identifies sequence variants with high accuracy.

Conclusions: As more genome sequences become available, the demand for genome comparison is increasing. Therefore an efficient and robust algorithm is most desirable. We believe GSAlign can be a useful tool. It exhibits the abilities of ultra-fast alignment as well as high accuracy and sensitivity for detecting sequence variations.

Keywords: Comparative genomics; Genome comparison; Personal genomics; Sequence alignment; Variation detection.

MeSH terms

  • Algorithms
  • Genome*
  • Genomics / methods*
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA
  • Software*