Finishing monkeypox genomes from short reads: assembly analysis and a neural network method

BMC Genomics. 2016 Aug 31;17 Suppl 5(Suppl 5):497. doi: 10.1186/s12864-016-2826-8.

Abstract

Background: Poxviruses constitute one of the largest and most complex animal virus families known. The notorious smallpox disease has been eradicated and the virus contained, but its simian sister, monkeypox is an emerging, untreatable infectious disease, killing 1 to 10 % of its human victims. In the case of poxviruses, the emergence of monkeypox outbreaks in humans and the need to monitor potential malicious release of smallpox virus requires development of methods for rapid virus identification. Whole-genome sequencing (WGS) is an emergent technology with increasing application to the diagnosis of diseases and the identification of outbreak pathogens. But "finishing" such a genome is a laborious and time-consuming process, not easily automated. To date the large, complete poxvirus genomes have not been studied comprehensively in terms of applying WGS techniques and evaluating genome assembly algorithms.

Results: To explore the limitations to finishing a poxvirus genome from short reads, we first analyze the repetitive regions in a monkeypox genome and evaluate genome assembly on the simulated reads. We also report on procedures and insights relevant to the assembly (from realistically short reads) of genomes. Finally, we propose a neural network method (namely Neural-KSP) to "finish" the process by closing gaps remaining after conventional assembly, as the final stage in a protocol to elucidate clinical poxvirus genomic sequences.

Conclusions: The protocol may prove useful in any clinical viral isolate (regardless if a reference-strain sequence is available) and especially useful in genomes confounded by many global and local repetitive sequences embedded in them. This work highlights the feasibility of finishing real, complex genomes by systematically analyzing genetic characteristics, thus remedying existing assembly shortcomings with a neural network method. Such finished sequences may enable clinicians to track genetic distance between viral isolates that provides a powerful epidemiological tool.

Keywords: Gap filling; Graph; Neural Network; Poxvirus; Public health; Repetitive sequence; Whole-genome sequencing; de novo Assembly.

MeSH terms

  • Genome, Viral*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Monkeypox virus / genetics*
  • Mpox (monkeypox) / virology
  • Neural Networks, Computer
  • Sequence Analysis, DNA