Seeker: alignment-free identification of bacteriophage genomes by deep learning

Nucleic Acids Res. 2020 Dec 2;48(21):e121. doi: 10.1093/nar/gkaa856.

Abstract

Recent advances in metagenomic sequencing have enabled discovery of diverse, distinct microbes and viruses. Bacteriophages, the most abundant biological entity on Earth, evolve rapidly, and therefore, detection of unknown bacteriophages in sequence datasets is a challenge. Most of the existing detection methods rely on sequence similarity to known bacteriophage sequences, impeding the identification and characterization of distinct, highly divergent bacteriophage families. Here we present Seeker, a deep-learning tool for alignment-free identification of phage sequences. Seeker allows rapid detection of phages in sequence datasets and differentiation of phage sequences from bacterial ones, even when those phages exhibit little sequence similarity to established phage families. We comprehensively validate Seeker's ability to identify previously unidentified phages, and employ this method to detect unknown phages, some of which are highly divergent from the known phage families. We provide a web portal (seeker.pythonanywhere.com) and a user-friendly Python package (github.com/gussow/seeker) allowing researchers to easily apply Seeker in metagenomic studies, for the detection of diverse unknown bacteriophages.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Bacteria / genetics
  • Bacteria / virology*
  • Bacteriophages / classification
  • Bacteriophages / genetics*
  • Biological Evolution
  • DNA, Viral / genetics*
  • Deep Learning
  • Genome, Viral*
  • Humans
  • Metagenome*
  • Metagenomics / methods
  • Phylogeny
  • Sequence Analysis, DNA
  • Software*

Substances

  • DNA, Viral