Comprehensive benchmarking of software for mapping whole genome bisulfite data: from read alignment to DNA methylation analysis

Brief Bioinform. 2021 Sep 2;22(5):bbab021. doi: 10.1093/bib/bbab021.

Abstract

Whole genome bisulfite sequencing is currently at the forefront of epigenetic analysis, facilitating the nucleotide-level resolution of 5-methylcytosine (5mC) on a genome-wide scale. Specialized software have been developed to accommodate the unique difficulties in aligning such sequencing reads to a given reference, building on the knowledge acquired from model organisms such as human, or Arabidopsis thaliana. As the field of epigenetics expands its purview to non-model plant species, new challenges arise which bring into question the suitability of previously established tools. Herein, nine short-read aligners are evaluated: Bismark, BS-Seeker2, BSMAP, BWA-meth, ERNE-BS5, GEM3, GSNAP, Last and segemehl. Precision-recall of simulated alignments, in comparison to real sequencing data obtained from three natural accessions, reveals on-balance that BWA-meth and BSMAP are able to make the best use of the data during mapping. The influence of difficult-to-map regions, characterized by deviations in sequencing depth over repeat annotations, is evaluated in terms of the mean absolute deviation of the resulting methylation calls in comparison to a realistic methylome. Downstream methylation analysis is responsive to the handling of multi-mapping reads relative to mapping quality (MAPQ), and potentially susceptible to bias arising from the increased sequence complexity of densely methylated reads.

Keywords: DNA methylation; WGBS mapping software; benchmark; epigenetics; non-model plants; whole genome bisulfite sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking / methods*
  • Chromosome Mapping / methods
  • DNA Methylation / genetics*
  • DNA, Plant / drug effects
  • DNA, Plant / genetics
  • Epigenesis, Genetic
  • Epigenomics / methods*
  • Fragaria / genetics*
  • Genome, Plant*
  • Poaceae / genetics*
  • Sequence Alignment / methods
  • Software*
  • Sulfites / pharmacology*
  • Thlaspi / genetics*
  • Whole Genome Sequencing / methods

Substances

  • DNA, Plant
  • Sulfites
  • hydrogen sulfite