Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes

Gene. 2012 Jul 15;503(1):56-64. doi: 10.1016/j.gene.2012.04.043. Epub 2012 Apr 24.

Abstract

Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Antigens, Plant / genetics
  • Antigens, Plant / metabolism
  • Arabidopsis / genetics*
  • Arabidopsis / metabolism
  • Arabidopsis Proteins / genetics
  • Arabidopsis Proteins / metabolism
  • Brassica / genetics
  • Brassica / metabolism
  • Carrier Proteins / genetics
  • Carrier Proteins / metabolism
  • Cytochrome P-450 Enzyme System / genetics
  • False Positive Reactions
  • Fatty Acids / genetics
  • Fatty Acids / metabolism
  • Gene Expression Profiling / methods
  • Gene Expression Profiling / statistics & numerical data
  • Gene Expression Regulation, Plant / genetics
  • Genome, Plant*
  • Lipid Metabolism / genetics
  • Metabolic Networks and Pathways / genetics
  • Models, Genetic
  • Multigene Family*
  • Operon / genetics*
  • Plant Proteins / genetics
  • Plant Proteins / metabolism
  • Proteasome Endopeptidase Complex / genetics
  • Proteasome Endopeptidase Complex / metabolism
  • Ribosomes / genetics
  • Ribosomes / metabolism
  • Ubiquitin / genetics
  • Ubiquitin / metabolism

Substances

  • Antigens, Plant
  • Arabidopsis Proteins
  • Carrier Proteins
  • Fatty Acids
  • Plant Proteins
  • Ubiquitin
  • lipid transfer proteins, plant
  • Cytochrome P-450 Enzyme System
  • Proteasome Endopeptidase Complex