Seq2Sat and SatAnalyzer toolkit: Towards comprehensive microsatellite genotyping from sequencing data

Mol Ecol Resour. 2024 Apr;24(3):e13929. doi: 10.1111/1755-0998.13929. Epub 2024 Jan 30.

Abstract

Accurate and efficient microsatellite loci genotyping is an essential process in population genetics that is also used in various demographic analyses. Protocols for next-generation sequencing of microsatellite loci enable high-throughput and cross-compatible allele scoring, common issues that are not addressed by conventional capillary-based approaches. To improve this process, we have developed an all-in-one software, called Seq2Sat (sequence to microsatellite), in C++ to support automated microsatellite genotyping. It directly takes raw reads of microsatellite amplicons and conducts read quality control before inferring genotypes based on depth-of-read, read ratio, sequence composition and length. We have also developed a module for sex identification based on sex chromosome-specific locus amplicons. To allow for greater user access and complement autoscoring, we developed SatAnalyzer (microsatellite analyzer), a user-friendly web-based platform that conducts reads-to-report analyses by calling Seq2Sat for genotype autoscoring and produces interactive genotype graphs for manual editing. SatAnalyzer also allows users to troubleshoot multiplex optimization by analysing read quality and distribution across loci and samples in support of high-quality library preparation. To evaluate its performance, we benchmarked our toolkit Seq2Sat/SatAnalyzer against a conventional capillary gel method and existing microsatellite genotyping software, MEGASAT, using two datasets. Results showed that SatAnalyzer can achieve >99.70% genotyping accuracy and Seq2Sat is ~5 times faster than MEGASAT despite many more informative tables and figures being generated. Seq2Sat and SatAnalyzer are freely available on github (https://github.com/ecogenomicscanada/Seq2Sat) and dockerhub (https://hub.docker.com/r/rocpengliu/satanalyzer).

Keywords: autoscoring; manual scoring; microsatellite genotyping; population genetics; sequence-based genotyping.

MeSH terms

  • Alleles
  • Genetics, Population*
  • Genotype
  • Genotyping Techniques / methods
  • High-Throughput Nucleotide Sequencing / methods
  • Microsatellite Repeats
  • Sequence Analysis, DNA / methods
  • Software*

Grants and funding