SLHSD: hybrid scaffolding method based on short and long reads

Brief Bioinform. 2023 May 19;24(3):bbad169. doi: 10.1093/bib/bbad169.

Abstract

In genome assembly, scaffolding can obtain more complete and continuous scaffolds. Current scaffolding methods usually adopt one type of read to construct a scaffold graph and then orient and order contigs. However, scaffolding with the strengths of two or more types of reads seems to be a better solution to some tricky problems. Combining the advantages of different types of data is significant for scaffolding. Here, a hybrid scaffolding method (SLHSD) is present that simultaneously leverages the precision of short reads and the length advantage of long reads. Building an optimal scaffold graph is an important foundation for getting scaffolds. SLHSD uses a new algorithm that combines long and short read alignment information to determine whether to add an edge and how to calculate the edge weight in a scaffold graph. In addition, SLHSD develops a strategy to ensure that edges with high confidence can be added to the graph with priority. Then, a linear programming model is used to detect and remove remaining false edges in the graph. We compared SLHSD with other scaffolding methods on five datasets. Experimental results show that SLHSD outperforms other methods. The open-source code of SLHSD is available at https://github.com/luojunwei/SLHSD.

Keywords: alignments; genome assembly; hybrid assembly; scaffolding.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • High-Throughput Nucleotide Sequencing* / methods
  • Linear Models
  • Sequence Analysis, DNA / methods
  • Software