Using SPAdes, AUGUSTUS, and BLAST in an Automated Pipeline for Clustering Homologous Exome Sequences

Curr Protoc. 2022 May;2(5):e449. doi: 10.1002/cpz1.449.

Abstract

Cross-species exome sequencing approaches provide unprecedented avenues for obtaining genetic diversity, evolutionary relationships, and functional information from a variety of organisms including non-model species. These approaches offer cost-effective opportunities to study multiple individuals or species in parallel, but also create bioinformatics challenges in the application of multiple but powerful bioinformatics tools for the identification of homologous gene families across individual or species boundaries. Popular tools of this kind include SPAdes for sequence assembly, AUGUSTUS for ab initio gene prediction, and BLAST for building homologous sequence families. These tools can also be sophisticated in terms of installation and usage. Here, we present detailed steps on how to run these tools for the recovery and clustering of exon sequences from cross-species raw exome-capture data into homologous sequence families. We also present a utility pipeline, CODSEQCP, that automates these steps to cluster exon sequences, facilitating population genomics and evolutionary studies. © 2022 Wiley Periodicals LLC. Basic Protocol 1: Reads assembly using SPAdes Basic Protocol 2: Coding sequence extraction using AUGUSTUS Basic Protocol 3: Sequence clustering using BLAST Alternate Protocol: How to run CODSEQCP.

Keywords: AUGUSTUS; BLAST; SPAdes; cross-species exome capture; homologous clustering.

MeSH terms

  • Cluster Analysis
  • Computational Biology* / methods
  • Exome* / genetics
  • Humans
  • Sequence Homology