Clair3-trio: high-performance Nanopore long-read variant calling in family trios with trio-to-trio deep neural networks

Brief Bioinform. 2022 Sep 20;23(5):bbac301. doi: 10.1093/bib/bbac301.

Abstract

Accurate identification of genetic variants from family child-mother-father trio sequencing data is important in genomics. However, state-of-the-art approaches treat variant calling from trios as three independent tasks, which limits their calling accuracy for Nanopore long-read sequencing data. For better trio variant calling, we introduce Clair3-Trio, the first variant caller tailored for family trio data from Nanopore long-reads. Clair3-Trio employs a Trio-to-Trio deep neural network model, which allows it to input the trio sequencing information and output all of the trio's predicted variants within a single model to improve variant calling. We also present MCVLoss, a novel loss function tailor-made for variant calling in trios, leveraging the explicit encoding of the Mendelian inheritance. Clair3-Trio showed comprehensive improvement in experiments. It predicted far fewer Mendelian inheritance violation variations than current state-of-the-art methods. We also demonstrated that our Trio-to-Trio model is more accurate than competing architectures. Clair3-Trio is accessible as a free, open-source project at https://github.com/HKU-BAL/Clair3-Trio.

Keywords: Mendelian inheritance; deep neural networks; family trios; nanopore long-read; variant calling.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genomics / methods
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Nanopores*
  • Neural Networks, Computer
  • Sequence Analysis, DNA
  • Software