Improved annotation of a plant pathogen genome Xanthomonas oryzae pv. oryzae PXO99A

J Biomol Struct Dyn. 2013 Mar;31(3):342-50. doi: 10.1080/07391102.2012.698218. Epub 2012 Aug 1.

Abstract

Many bacterial genomes have been sequenced and stored in public databases now, of which Reference Sequence (RefSeq) is the most widely used one. However, the annotation in RefSeq is still unsatisfactory. The present analysis is focused on the re-annotation of an important plant pathogen genome Xanthomonas oryzae pv. oryzae PXO99A (Xoo PXO99A), which is the causal agent of bacterial blight on rice. Based on the parameters of 28 nucleotide frequencies and support vector machine algorithm, 41 originally annotated hypothetical genes were recognized as noncoding sequences, which were further supported by principal component analysis and other evidence. Ten of them were tested with reverse transcription-polymerase chain reaction experiments (RT-PCR), and all of them were confirmed to be noncoding sequences. Furthermore, 197 potential new genes not annotated in RefSeq were both recognized by two ab initio gene finding programs. Most of them only have sequence similarities with part of the known genes in other species, so they are unlikely to be protein-coding genes. Twelve potential new genes have high full-length sequence similarities with function-known genes, which are very likely to be true protein-coding genes. All the 12 potential genes were tested with RT-PCR, and 11 of them (92%) were successfully amplified in cDNA template. The RT-PCR experiments confirm that our theoretical prediction has high accuracy. The improvement of Xoo PXO99A annotation is helpful for the research of lifestyle, metabolism, and pathogenicity of this important plant pathogen. The improved annotation can be obtained from http://211.69.128.148/Xoo .

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • DNA, Bacterial / genetics
  • DNA, Intergenic / genetics
  • Genes, Bacterial / genetics
  • Genome, Bacterial / genetics*
  • Molecular Sequence Annotation*
  • Open Reading Frames / genetics
  • Oryza / microbiology*
  • Polymerase Chain Reaction
  • Reproducibility of Results
  • Xanthomonas / genetics*

Substances

  • DNA, Bacterial
  • DNA, Intergenic