MgCod: Gene Prediction in Phage Genomes with Multiple Genetic Codes

J Mol Biol. 2023 Jul 15;435(14):168159. doi: 10.1016/j.jmb.2023.168159. Epub 2023 May 25.

Abstract

Massive sequencing of microbiomes has led to the discovery of a large number of phage genomes with intermittent stop codon recoding. We have developed a computational tool, MgCod, that identifies genomic regions (blocks) with distinct stop codon recoding simultaneously with the prediction of protein-coding regions. When MgCod was used to scan a large volume of human metagenomic contigs hundreds of viral contigs with intermittent stop codon recoding were revealed. Many of these contigs originated from genomes of known crAssphages. Further analyses had shown that intermittent recoding was associated with subtle patterns in the organization of protein-coding genes, such as 'single-coding' and 'dual-coding'. The dual-coding genes, clustered into blocks, could be translated by two alternative codes producing nearly identical proteins. It was observed that the dual-coded blocks were enriched with the early-stage phage genes, while the late-stage genes were residing in the single-coded blocks. MgCod can identify types of stop codon recoding in novel genomic sequences in parallel with gene prediction. It is available for download from https://github.com/gatech-genemark/MgCod.

Keywords: crAssphages; gene prediction; metagenomes; multiple genetic codes; stop codon recoding.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bacteriophages* / genetics
  • Codon, Terminator* / genetics
  • Genome, Viral*
  • Humans
  • Proteins / genetics
  • Sequence Analysis

Substances

  • Codon, Terminator
  • Proteins