Database and analysis system for cDNA clones obtained from full-length enriched cDNA libraries

In Silico Biol. 2002;2(1):5-18.

Abstract

We have developed an efficient sequence-analysis system and a database system for clones obtained from full-length enriched cDNA libraries made by using the oligo-capping method. We developed a semi-automatic analysis system for 5'- and 3'-end sequences. It pre-processes raw sequences (vector cut and accurate-sequence region extraction), clusters the sequences, searches for similarities through public databases, annotates completeness of clones and analyzes the ORFs in the sequences. Newly developed or improved programs are used in each step. A new program, ESTiMateFull is used to evaluate and to predict the sequence-fullness based on comparisons with mRNA and EST sequences, respectively. The ATGpr program is used to predict sequence-fullness based on statistical information. The combination of full-length enriched cDNA clones and ATGpr fullness prediction resulted in 70% accuracy in the specificity and the sensitivity of the fullness predictions. For the ORFs predicted by the ATGpr, the signal peptides are predicted and a motif search is performed by our new system. We also developed a program that assembles our sequences with dbEST sequences and developed a system to retrieve clones by the characteristics of the ORFs. As keywords, combination of various results of the analyses can be used for retrieval. And various results such as ORF features and database search results can be shown on the same screen by multiple displays. Full-length clones having interesting functions can thus be retrieved efficiently by using this system.

MeSH terms

  • Amino Acid Sequence
  • Base Sequence
  • Cloning, Molecular
  • DNA, Complementary*
  • Databases, Nucleic Acid*
  • Expressed Sequence Tags
  • Gene Library
  • Image Processing, Computer-Assisted / methods
  • Molecular Sequence Data
  • Sequence Analysis, DNA / methods*
  • Software*

Substances

  • DNA, Complementary