Phage Commander, an Application for Rapid Gene Identification in Bacteriophage Genomes Using Multiple Programs

Phage (New Rochelle). 2021 Dec 1;2(4):204-213. doi: 10.1089/phage.2020.0044. Epub 2021 Dec 16.

Abstract

The number of sequenced bacteriophage genomes is growing at an exponential rate. The majority of sequenced bacteriophage genomes are annotated by one or more of several freely available gene identification programs (Glimmer, GeneMark, RAST, Prodigal, etc.). No program has been shown to consistently outperform the others; thus, the choice of which program to use is not obvious. We present the Phage Commander application for rapid identification of bacteriophage genes using multiple gene identification programs. Phage Commander runs a bacteriophage genome sequence through nine gene identification programs (and an additional program for identification of tRNAs) and integrates the results within a single output table. Phage Commander also generates formatted output files for direct export to National Center for Biotechnology Information GenBank or genome visualization programs such as DNA Master. Users can select the threshold for which genes to export (genes identified by at least one program, genes identified by at least two programs, etc.). Phage Commander was benchmarked using eight high-quality bacteriophage genomes whose genes are backed by experimental data. Our results show that the most accurate annotations are obtained by exporting genes identified by at least two or three programs. Many groups opt to manually curate the annotations obtained from gene identification programs, and Phage Commander was designed to facilitate manual curation of genome annotations. Our benchmarking results show that manual curation does indeed produce more accurate annotations than any individual gene identification program. The authors thus recommend manually curating the output of Phage Commander to generate maximally accurate annotations. Phage Commander is currently being used in the corresponding author's bacteriophage genome annotation class and has reduced the labor cost and improved the quality of genome annotations.

Keywords: bacteriophages; gene identification; genome annotation; genomics.