Computational operon prediction in whole-genomes and metagenomes

Brief Funct Genomics. 2017 Jul 1;16(4):181-193. doi: 10.1093/bfgp/elw034.

Abstract

Microbial diversity in unique environmental settings enables abrupt responses catalysed by altering the gene regulation and formation of gene clusters called operons. Operons increases bacterial adaptability, which in turn increases their survival. This review article presents the emergence of computational operon prediction methods for whole microbial genomes and metagenomes, and discusses their strengths and limitations. Most of the whole-genome operon prediction methods struggle to generalize on unrelated genomes. The applicability of universal whole-genome operon prediction methods to metagenomic data is an interesting yet less investigated question. We have evaluated the potential of various operon prediction features for genomic and metagenomic data. Most of operon prediction methods with high accuracy have been compiled into databases. Despite of the high predictive performance, the data among many databases are not completely consistent for similar species. We performed a correlation analysis between the computationally predicted operon databases and experimentally validated data for Escherichia coli, Bacillus subtilis and Mycobacterium tuberculosis. Operon prediction for most of the less characterized microbes cannot be verified due to absence of experimentally validated operons. The generation of validated information for other microbes would test the authenticity of operon databases for other less annotated microbes as well. Advances in sequencing technologies and development of better analysis methods will help researchers to overcome the technological hurdles (such as long sequencing reads and improved contig size) and further improve operon predictions and better utilize operonic information.

Keywords: gene regulation; metagenome; microbiome; operon prediction; secondary metabolites.

Publication types

  • Review

MeSH terms

  • Artificial Intelligence
  • Computational Biology / methods*
  • Databases, Genetic
  • Metagenome*
  • Operon / genetics*
  • Promoter Regions, Genetic / genetics