UGbS-Flex, a novel bioinformatics pipeline for imputation-free SNP discovery in polyploids without a reference genome: finger millet as a case study

BMC Plant Biol. 2018 Jun 15;18(1):117. doi: 10.1186/s12870-018-1316-3.

Abstract

Background: Research on orphan crops is often hindered by a lack of genomic resources. With the advent of affordable sequencing technologies, genotyping an entire genome or, for large-genome species, a representative fraction of the genome has become feasible for any crop. Nevertheless, most genotyping-by-sequencing (GBS) methods are geared towards obtaining large numbers of markers at low sequence depth, which excludes their application in heterozygous individuals. Furthermore, bioinformatics pipelines often lack the flexibility to deal with paired-end reads or to be applied in polyploid species.

Results: UGbS-Flex combines publicly available software with in-house python and perl scripts to efficiently call SNPs from genotyping-by-sequencing reads irrespective of the species' ploidy level, breeding system and availability of a reference genome. Noteworthy features of the UGbS-Flex pipeline are an ability to use paired-end reads as input, an effective approach to cluster reads across samples with enhanced outputs, and maximization of SNP calling. We demonstrate use of the pipeline for the identification of several thousand high-confidence SNPs with high representation across samples in an F3-derived F2 population in the allotetraploid finger millet. Robust high-density genetic maps were constructed using the time-tested mapping program MAPMAKER which we upgraded to run efficiently and in a semi-automated manner in a Windows Command Prompt Environment. We exploited comparative GBS with one of the diploid ancestors of finger millet to assign linkage groups to subgenomes and demonstrate the presence of chromosomal rearrangements.

Conclusions: The paper combines GBS protocol modifications, a novel flexible GBS analysis pipeline, UGbS-Flex, recommendations to maximize SNP identification, updated genetic mapping software, and the first high-density maps of finger millet. The modules used in the UGbS-Flex pipeline and for genetic mapping were applied to finger millet, an allotetraploid selfing species without a reference genome, as a case study. The UGbS-Flex modules, which can be run independently, are easily transferable to species with other breeding systems or ploidy levels.

Keywords: Chromosomal rearrangements; E. indica; Eleusine coracana; Finger millet; GBS-pipeline; Genetic mapping; Genotyping-by-sequencing (GBS); Paired-end reads; Polyploid; SNP calling.

MeSH terms

  • Chromosome Mapping / methods
  • Computational Biology / methods
  • DNA, Plant / genetics
  • Eleusine / genetics*
  • Genetic Linkage
  • Genome, Plant / genetics
  • Genotyping Techniques / methods*
  • Polymorphism, Single Nucleotide / genetics*
  • Polyploidy*
  • Sequence Analysis, DNA / methods
  • Software

Substances

  • DNA, Plant