A Guide to Sequencing for Long Repetitive Regions

Methods Mol Biol. 2023:2632:131-146. doi: 10.1007/978-1-0716-2996-3_10.

Abstract

Full-length analysis of genes with highly repetitive sequences is challenging in two respects: assembly algorithm and sequencing accuracy. The de Bruijn graph often used in short-read assembly cannot distinguish adjacent repeat units. On the other hand, the accuracy of long reads is not yet high enough to identify each and every repeat unit. In this chapter, I present an example of a strategy to solve these problems and obtain the full length of long repeats by combining the extraction and assembly of repeat units based on overlap-layout-consensus and scaffolding by long reads.

Keywords: De novo sequencing; Highly repetitive sequence; Non-model organism; Overlap-layout-consensus; Structural protein.

MeSH terms

  • Algorithms
  • High-Throughput Nucleotide Sequencing*
  • Repetitive Sequences, Nucleic Acid* / genetics
  • Sequence Analysis, DNA