INfrastructure for a PHAge REference Database: Identification of Large-Scale Biases in the Current Collection of Cultured Phage Genomes

Phage (New Rochelle). 2021 Dec 1;2(4):214-223. doi: 10.1089/phage.2021.0007. Epub 2021 Dec 16.

Abstract

Background: With advances in sequencing technology and decreasing costs, the number of phage genomes that have been sequenced has increased markedly in the past decade. Materials and Methods: We developed an automated retrieval and analysis system for phage genomes (https://github.com/RyanCook94/inphared) to produce the INfrastructure for a PHAge REference Database (INPHARED) of phage genomes and associated metadata. Results: As of January 2021, 14,244 complete phage genomes have been sequenced. The INPHARED data set is dominated by phages that infect a small number of bacterial genera, with 75% of phages isolated on only 30 bacterial genera. There is further bias, with significantly more lytic phage genomes (∼70%) than temperate (∼30%) within our database. Collectively, this results in ∼54% of temperate phage genomes originating from just three host genera. With much debate on the carriage of antibiotic resistance genes and their potential safety in phage therapy, we searched for putative antibiotic resistance genes. Frequency of antibiotic resistance gene carriage was found to be higher in temperate phages than in lytic phages and again varied with host. Conclusions: Given the bias of currently sequenced phage genomes, we suggest to fully understand phage diversity, efforts should be made to isolate and sequence a larger number of phages, in particular temperate phages, from a greater diversity of hosts.

Keywords: antibiotic resistance genes; jumbo phages; phage genomes; virulence genes.