SVLR: Genome Structural Variant Detection Using Long-Read Sequencing Data

J Comput Biol. 2021 Aug;28(8):774-788. doi: 10.1089/cmb.2021.0048. Epub 2021 May 10.

Abstract

Genome structural variants (SVs) have great impacts on human phenotype and diversity, and have been linked to numerous diseases. Long-read sequencing technologies arise to make it possible to find SVs of as long as 10,000 nucleotides. Thus, long read-based SV detection has been drawing attention of many recent research projects, and many tools have been developed for long reads to detect SVs recently. In this article, we present a new method, called SVLR, to detect SVs based on long-read sequencing data. Comparing with existing methods, SVLR can detect three new kinds of SVs: block replacements, block interchanges, and translocations. Although these new SVs are structurally more complicated, SVLR achieves accuracies that are comparable with those of the classic SVs. Moreover, for the classic SVs that can be detected by state-of-the-art methods (e.g., SVIM and Sniffles), our experiments demonstrate recall improvements of up to 38% without harming the precisions (i.e., >78%). We also point out three directions to further improve SV detection in the future. Source codes: https://github.com/GWYSDU/SVLR.

Keywords: genome structural variant; genome structural variant detection; long-read sequencing and single-molecule sequencing; third-generation sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Disease / genetics*
  • Genomic Structural Variation*
  • Humans
  • Sequence Analysis, DNA
  • Single Molecule Imaging