Machine learning meets genome assembly

Kleber Padovani de Souza; João Carlos Setubal; André Carlos Ponce de Leon F de Carvalho; Guilherme Oliveira; Annie Chateau; Ronnie Alves

doi:10.1093/bib/bby072

Machine learning meets genome assembly

Brief Bioinform. 2019 Nov 27;20(6):2116-2129. doi: 10.1093/bib/bby072.

Authors

Kleber Padovani de Souza¹, João Carlos Setubal^{2

3}, André Carlos Ponce de Leon F de Carvalho⁴, Guilherme Oliveira⁵, Annie Chateau⁴, Ronnie Alves^{1

5}

Affiliations

¹ Federal University of Pará, Brazil.
² University of São Paulo, Brazil.
³ Department of Computer Science, University of São Paulo, Brazil.
⁴ Vale Technology Institute-Sustainable Development, Brazil.
⁵ University of Montpellier, LIRMM, France.

PMID: 30137230
DOI: 10.1093/bib/bby072

Abstract

Motivation: With the recent advances in DNA sequencing technologies, the study of the genetic composition of living organisms has become more accessible for researchers. Several advances have been achieved because of it, especially in the health sciences. However, many challenges which emerge from the complexity of sequencing projects remain unsolved. Among them is the task of assembling DNA fragments from previously unsequenced organisms, which is classified as an NP-hard (nondeterministic polynomial time hard) problem, for which no efficient computational solution with reasonable execution time exists. However, several tools that produce approximate solutions have been used with results that have facilitated scientific discoveries, although there is ample room for improvement. As with other NP-hard problems, machine learning algorithms have been one of the approaches used in recent years in an attempt to find better solutions to the DNA fragment assembly problem, although still at a low scale.

Results: This paper presents a broad review of pioneering literature comprising artificial intelligence-based DNA assemblers-particularly the ones that use machine learning-to provide an overview of state-of-the-art approaches and to serve as a starting point for further study in this field.

Keywords: de novo assembly; artificial intelligence; genome assembly; machine learning; metagenomics.

Publication types

Research Support, Non-U.S. Gov't
Review

MeSH terms

Algorithms
Genome*
High-Throughput Nucleotide Sequencing / methods
Machine Learning*
Sequence Analysis, DNA