Theoretical prediction and experimental verification of protein-coding genes in plant pathogen genome Agrobacterium tumefaciens strain C58

PLoS One. 2012;7(9):e43176. doi: 10.1371/journal.pone.0043176. Epub 2012 Sep 11.

Abstract

Agrobacterium tumefaciens strain C58 is a Gram-negative soil bacterium capable of inducing tumors (crown galls) on many dicotyledonous plants. The genome of A. tumefaciens strain C58 was re-annotated based on the Z-curve method. First, all the 'hypothetical genes' were re-identified, and 29 originally annotated 'hypothetical genes' were recognized to be non-coding open reading frames (ORFs). Theoretical evidence obtained from principal component analysis, clusters of orthologous groups of proteins occupation, and average length distribution showed that these non-coding ORFs were highly unlikely to encode proteins. Results from the reverse transcription-polymerase chain reaction (RT-PCR) experiments on three different growth stages of A. tumefaciens C58 confirmed that 23 (79%) of the identified non-coding ORFs have no transcripts in these growth stages. In addition, using theoretical prediction, 19 potential protein-coding genes were predicted to be new protein-coding genes. Fifteen (79%) of these genes were verified with RT-PCR experiments. The RT-PCR experimental results confirmed the reliability of our theoretical prediction, indicating that false-positive prediction and missing genes always exist in the annotation of A. tumefaciens C58 genome. The improved annotation will serve as a valuable resource for the research of the lifestyle, metabolism, and pathogenicity of A. tumefaciens C58. The re-annotation of A. tumefaciens C58 can be obtained from http://211.69.128.148/Atum/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Agrobacterium tumefaciens / genetics*
  • Base Sequence
  • Gene Expression Regulation, Bacterial
  • Genes, Bacterial / genetics*
  • Models, Genetic*
  • Molecular Sequence Annotation
  • Molecular Sequence Data
  • Open Reading Frames / genetics*
  • Plants / microbiology*
  • Principal Component Analysis
  • Replicon / genetics
  • Reproducibility of Results
  • Reverse Transcriptase Polymerase Chain Reaction

Associated data

  • GENBANK/BK008582
  • GENBANK/BK008583
  • GENBANK/BK008584
  • GENBANK/BK008585
  • GENBANK/BK008586
  • GENBANK/BK008587
  • GENBANK/BK008588
  • GENBANK/BK008589
  • GENBANK/BK008590
  • GENBANK/BK008591
  • GENBANK/BK008592
  • GENBANK/BK008593
  • GENBANK/BK008594
  • GENBANK/BK008595
  • GENBANK/BK008596

Grants and funding

This study was supported by the Major International Joint Research Project, National Natural Science Foundation of China (31010103903, 31071659), and the National Basic Research Program of China (2010CB126100). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.