[A new method for EST clustering]

Yi Chuan Xue Bao. 2003 Feb;30(2):147-53.
[Article in Chinese]

Abstract

We developed an EST (expressed sequence tag) clustering method, ESTClustering, to generate high-quality unique expressed sequence based on large-scale EST sequencing. The method uses consensus sequences to sequence analyze with megablast and assemble each cluster with phrap in clustering process. The clustering strategy can efficiently identify gene family and alternate splicing forms of expressed sequences. It can also reduce the adverse effects caused by sequence errors. The ESTClustering method tends to provide more expressed gene forms comparing with the UniGene clustering method of the National Center for Biotechnology Information. Analysis of the 112,256 ESTs of Arabidopsis with ESTClustering produced 23,581 EST clusters. Among these Arabidopsis EST clusters, 13,597 have corresponding genome coding sequences and this number is close to the number of genes predicted with Arabidopsis ESTs. Using this clustering method, a total of 147,191 rice ESTs were clustered into 33,896 groups.

Publication types

  • English Abstract
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Computational Biology / methods
  • DNA, Complementary / genetics
  • Databases, Nucleic Acid
  • Expressed Sequence Tags*
  • Oryza / genetics

Substances

  • DNA, Complementary