Mingle: A Command Line Utility for Merging Multi-fasta Files

J Comput Biol. 2019 Apr;26(4):396-404. doi: 10.1089/cmb.2018.0243. Epub 2019 Feb 14.

Abstract

Massively parallel sequencing (MPS) has become a standard technique in molecular biology whose application has spread from the analysis of the human genome to that of virtually all other organisms. MPS requires reference genomes to be performed and, in some cases, multiple genomes need to be handled as a single unit to carry out genetic analysis. Nucleic acid sequences are typically stored in "fasta" files, which can contain multiple genomes ("multi-fasta"). Although it is possible to convert a multi-fasta file into a single sequence using specific computer commands, the resulting file will not keep track of the boundaries of the original sequences, making it difficult to determine to what genome read obtained from MPS belong to. In this study we introduce mingle, a shell script that can be used to create custom reference genome by merging multi-fasta files while providing a list of boundaries of the individual genomes that can be used for downstream analysis.

Keywords: alignment; fasta; genome; reference; sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Computational Biology / methods*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Sequence Alignment
  • Whole Genome Sequencing / methods*