NanoMod: a computational tool to detect DNA modifications using Nanopore long-read sequencing data

BMC Genomics. 2019 Feb 4;20(Suppl 1):78. doi: 10.1186/s12864-018-5372-8.

Abstract

Background: Recent advances in single-molecule sequencing techniques, such as Nanopore sequencing, improved read length, increased sequencing throughput, and enabled direct detection of DNA modifications through the analysis of raw signals. These DNA modifications include naturally occurring modifications such as DNA methylations, as well as modifications that are introduced by DNA damage or through synthetic modifications to one of the four standard nucleotides.

Methods: To improve the performance of detecting DNA modifications, especially synthetically introduced modifications, we developed a novel computational tool called NanoMod. NanoMod takes raw signal data on a pair of DNA samples with and without modified bases, extracts signal intensities, performs base error correction based on a reference sequence, and then identifies bases with modifications by comparing the distribution of raw signals between two samples, while taking into account of the effects of neighboring bases on modified bases ("neighborhood effects").

Results: We evaluated NanoMod on simulation data sets, based on different types of modifications and different magnitudes of neighborhood effects, and found that NanoMod outperformed other methods in identifying known modified bases. Additionally, we demonstrated superior performance of NanoMod on an E. coli data set with 5mC (5-methylcytosine) modifications.

Conclusions: In summary, NanoMod is a flexible tool to detect DNA modifications with single-base resolution from raw signals in Nanopore sequencing, and will facilitate large-scale functional genomics experiments that use modified nucleotides.

Keywords: Computational tool; DNA modifications; Nanopore long-read data; Nanopore signal annotation; Statistics analysis.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Computer Simulation
  • DNA* / chemistry
  • DNA* / genetics
  • DNA* / metabolism
  • Escherichia coli / genetics
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Nanopores
  • Reproducibility of Results
  • Sequence Analysis, DNA / methods*
  • Software*
  • Workflow

Substances

  • DNA