Multiple Variant Calling Pipelines in Wheat Whole Exome Sequencing

Int J Mol Sci. 2021 Sep 27;22(19):10400. doi: 10.3390/ijms221910400.

Abstract

The highly challenging hexaploid wheat (Triticum aestivum) genome is becoming ever more accessible due to the continued development of multiple reference genomes, a factor which aids in the plight to better understand variation in important traits. Although the process of variant calling is relatively straightforward, selection of the best combination of the computational tools for read alignment and variant calling stages of the analysis and efficient filtering of the false variant calls are not always easy tasks. Previous studies have analyzed the impact of methods on the quality metrics in diploid organisms. Given that variant identification in wheat largely relies on accurate mining of exome data, there is a critical need to better understand how different methods affect the analysis of whole exome sequencing (WES) data in polyploid species. This study aims to address this by performing whole exome sequencing of 48 wheat cultivars and assessing the performance of various variant calling pipelines at their suggested settings. The results show that all the pipelines require filtering to eliminate false-positive calls. The high consensus among the reference SNPs called by the best-performing pipelines suggests that filtering provides accurate and reproducible results. This study also provides detailed comparisons for high sensitivity and precision at individual and population levels for the raw and filtered SNP calls.

Keywords: BCFtools; BWA; Bowtie2; SNPs; STAR; WES; variants; wheat.

MeSH terms

  • Exome Sequencing*
  • Genome, Plant*
  • Polymorphism, Single Nucleotide*
  • Polyploidy*
  • Triticum / genetics*