BaseFormer: Transformer based Base-Caller for Fast and Accurate Next Generation Sequencing

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul:2022:463-466. doi: 10.1109/EMBC48229.2022.9871730.

Abstract

Gene sequencing technology is a tool which greatly impacts modern biology and medicine. The next-generation sequencing (NGS) lies at the heart of gene sequencing for its massively increasing throughput, but it is difficult to analyze the large quantities of fluorescent images with high accuracy because the fluorescent signals are weak with varying noise signals, and current designs are limited on accuracy and speed. In this paper, we proposed a novel deep learning based gene sequencing pipeline with semi-automatic labelling method. The obtained results are promising, especially on the high-density data, as the BaseFormer surpasses the traditional methods in terms of cluster quality (Q30: 88 %), throughput (16.5% better), and with similar and low error rate (down to 0.137% on average, best at 0.068 % on high-density data).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Coloring Agents
  • Electric Power Supplies
  • Heart
  • High-Throughput Nucleotide Sequencing*
  • Medicine*

Substances

  • Coloring Agents