On a greedy approach for genome scaffolding

Algorithms Mol Biol. 2022 Oct 29;17(1):16. doi: 10.1186/s13015-022-00223-x.

Abstract

Background: Scaffolding is a bioinformatics problem aimed at completing the contig assembly process by determining the relative position and orientation of these contigs. It can be seen as a paths and cycles cover problem of a particular graph called the "scaffold graph".

Results: We provide some NP-hardness and inapproximability results on this problem. We also adapt a greedy approximation algorithm on complete graphs so that it works on a special class aiming to be close to real instances. The described algorithm is the first polynomial-time approximation algorithm designed for this problem on non-complete graphs.

Conclusion: Tests on a set of simulated instances show that our algorithm provides better results than the version on complete graphs.

Keywords: Approximation; Complexity; Dynamic programming; Genome scaffolding; Poly-APX-hardness.