Beyond sequencing: machine learning algorithms extract biology hidden in Nanopore signal data

Trends Genet. 2022 Mar;38(3):246-257. doi: 10.1016/j.tig.2021.09.001. Epub 2021 Oct 25.

Abstract

Nanopore sequencing provides signal data corresponding to the nucleotide motifs sequenced. Through machine learning-based methods, these signals are translated into long-read sequences that overcome the read size limit of short-read sequencing. However, analyzing the raw nanopore signal data provides many more opportunities beyond just sequencing genomes and transcriptomes: algorithms that use machine learning approaches to extract biological information from these signals allow the detection of DNA and RNA modifications, the estimation of poly(A) tail length, and the prediction of RNA secondary structures. In this review, we discuss how developments in machine learning methodologies contributed to more accurate basecalling and lower error rates, and how these methods enable new biological discoveries. We argue that direct nanopore sequencing of DNA and RNA provides a new dimensionality for genomics experiments and highlight challenges and future directions for computational approaches to extract the additional information provided by nanopore signal data.

Keywords: DNA/RNA modifications; basecalling; direct RNA-seq; machine learning; nanopore current signal; nanopore sequencing.

Publication types

  • Review

MeSH terms

  • Algorithms
  • Genomics
  • High-Throughput Nucleotide Sequencing / methods
  • Machine Learning
  • Nanopore Sequencing*
  • Nanopores*
  • Sequence Analysis, DNA / methods