Improved sequence mapping using a complete reference genome and lift-over

Nat Methods. 2024 Jan;21(1):41-49. doi: 10.1038/s41592-023-02069-6. Epub 2023 Nov 30.

Abstract

Complete, telomere-to-telomere (T2T) genome assemblies promise improved analyses and the discovery of new variants, but many essential genomic resources remain associated with older reference genomes. Thus, there is a need to translate genomic features and read alignments between references. Here we describe a method called levioSAM2 that performs fast and accurate lift-over between assemblies using a whole-genome map. In addition to enabling the use of several references, we demonstrate that aligning reads to a high-quality reference (for example, T2T-CHM13) and lifting to an older reference (for example, Genome reference Consortium (GRC)h38) improves the accuracy of the resulting variant calls on the old reference. By leveraging the quality improvements of T2T-CHM13, levioSAM2 reduces small and structural variant calling errors compared with GRC-based mapping using real short- and long-read datasets. Performance is especially improved for a set of complex medically relevant genes, where the GRC references are lower quality.

MeSH terms

  • Chromosome Mapping
  • Genome*
  • Genomics* / methods
  • High-Throughput Nucleotide Sequencing
  • Sequence Analysis, DNA / methods