Learning Shapelets for Improving Single-Molecule Nanopore Sensing

Anal Chem. 2019 Aug 6;91(15):10033-10039. doi: 10.1021/acs.analchem.9b01896. Epub 2019 Jul 18.

Abstract

The nanopore technique employs a nanoscale cavity to electrochemically confine individual molecules, achieving ultrasensitive single-molecule analysis based on evaluating the amplitude and duration of the ionic current. However, each nanopore sensing interface has its own intrinsic sensing ability, which does not always efficiently generate distinctive blockade currents for multiple analytes. Therefore, analytes that differ at only a single site often exhibit similar blockade currents or durations in nanopore experiments, which often produces serious overlap in the resulting statistical graphs. To improve the sensing ability of nanopores, herein we propose a novel shapelet-based machine learning approach to discriminate mixed analytes that exhibit nearly identical blockade current amplitudes and durations. DNA oligomers with a single-nucleotide difference, 5'-AAAA-3' and 5'-GAAA-3', are employed as model analytes that are difficult to identify in aerolysin nanopores at 100 mV. First, a set of the most informative and discriminative segments are learned from the time-series data set of blockade current signals using the learning time-series shapelets (LTS) algorithm. Then, the shapelet-transformed representation of the signals is obtained by calculating the minimum distance between the shapelets and the original signals. A simple logistic classifier is used to identify the two types of DNA oligomers in accordance with the corresponding shapelet-transformed representation. Finally, an evaluation is performed on the validation data set to show that our approach can achieve a high F1 score of 0.933. In comparison with the conventional statistical methods for the analysis of duration and residual current, the shapelet-transformed representation provides clearly discriminated distributions for multiple analytes. Taking advantage of the robust LTS algorithm, one could anticipate the real-time analysis of nanopore events for the direct identification and quantification of multiple biomolecules in a complex real sample (e.g., serum) without labels and time-consuming mutagenesis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bacterial Toxins / chemistry
  • Base Sequence
  • DNA / chemistry*
  • Nanopores*
  • Nucleotides / chemistry
  • Pore Forming Cytotoxic Proteins / chemistry

Substances

  • Bacterial Toxins
  • Nucleotides
  • Pore Forming Cytotoxic Proteins
  • aerolysin
  • DNA