Informatics issues in large-scale sequence analysis: elucidating the protein kinases of C. elegans

J Cell Biochem. 2000 Oct 20;80(2):181-6. doi: 10.1002/1097-4644(20010201)80:2<181::aid-jcb30>3.0.co;2-1.

Abstract

With the availability of the nearly complete genomic sequence of C. elegans, the first multicellular organism to be sequenced, molecular biology has definitely entered the postgenomic era. Annotation of the genomic sequence, which refers to identifying the genes and other biologically relevant sections of the genome, is an important and nontrivial next step. A first-pass annotation will be necessarily incomplete but will drive further biological experiments, which in turn will help to annotate the genome better. Given the scale of the genome sequence analysis, it is clear that the annotation should be automated as much as possible without sacrificing the quality of analysis. In this work, we outline our approach to identifying the protein kinases of C. elegans from the genomic sequence. We describe new tools we have developed for analysis, management and visualization of genomic data. By developing modular and scalable solutions, this study has provided a framework for future analysis of the Drosophila and human genomes.

MeSH terms

  • Animals
  • Caenorhabditis elegans / enzymology*
  • Computational Biology*
  • Database Management Systems
  • Protein Kinases / genetics*
  • Software

Substances

  • Protein Kinases