Detecting patterns of protein distribution and gene expression in silico

Proc Natl Acad Sci U S A. 1999 Mar 16;96(6):2937-42. doi: 10.1073/pnas.96.6.2937.

Abstract

Most biological information is contained within gene and genome sequences. However, current methods for analyzing these data are limited primarily to the prediction of coding regions and identification of sequence similarities. We have developed a computer algorithm, CoSMoS (for context sensitive motif searches), which adds context sensitivity to sequence motif searches. CoSMoS was challenged to identify genes encoding peroxisome-associated and oleate-induced genes in the yeast Saccharomyces cerevisiae. Specifically, we searched for genes capable of encoding proteins with a type 1 or type 2 peroxisomal targeting signal and for genes containing the oleate-response element, a cis-acting element common to fatty acid-regulated genes. CoSMoS successfully identified 7 of 8 known PTS-containing peroxisomal proteins and 13 of 14 known oleate-regulated genes. More importantly, CoSMoS identified an additional 18 candidate peroxisomal proteins and 300 candidate oleate-regulated genes. Preliminary localization studies suggest that these include at least 10 previously unknown peroxisomal proteins. Phenotypic studies of selected gene disruption mutants suggests that several of these new peroxisomal proteins play roles in growth on fatty acids, one is involved in peroxisome biogenesis and at least two are required for synthesis of lysine, a heretofore unrecognized role for peroxisomes. These results expand our understanding of peroxisome content and function, demonstrate the utility of CoSMoS for context-sensitive motif scanning, and point to the benefits of improved in silico genome analysis.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.
  • Retracted Publication

MeSH terms

  • Gene Expression Regulation, Fungal*
  • Genes, Fungal*
  • Genome, Fungal*
  • Microbodies / genetics*
  • Saccharomyces cerevisiae / genetics*
  • Sequence Analysis, DNA / methods*
  • Software