An incremental algorithm for Z-value computations

Comput Chem. 2002 Jul;26(5):403-11. doi: 10.1016/s0097-8485(02)00003-7.

Abstract

The Z-value (Comput. Chem. 23 (1999) 333) is an extension of the Z-score that is classically used to compare sets of biological sequences. The Z-value has been successfully used to handle complete genome studies as well as analyze large sets of proteins. The Z-value computation is based on a Monte Carlo approach to estimate the statistical significance of a Smith & Waterman alignment score. Comet et al. (Comput. Chem. 23 (1999) 333) have shown that, in contrast to the alignment score, the Z-value largely reduces the bias due to the lengths and compositions of the sequences. They also described an estimator of the deviation of Z-values, that we extend in this paper in order to optimize Z-values computation. The incremental algorithm described here provides two characteristics which are usually incompatible: (i) it improves the accuracy of Z-values calculation; (ii) it reduces the time complexity (this algorithm has been named incremental because it iteratively adds random sequences to the Monte-Carlo process when needed). Results are presented, originating from the all-by-all comparison of the proteins from Saccharomyces cerevisiae and Escherichia coli.

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Escherichia coli Proteins / chemistry
  • Escherichia coli Proteins / genetics
  • Genome, Bacterial
  • Genome, Fungal
  • Monte Carlo Method
  • Proteome
  • Saccharomyces cerevisiae Proteins / chemistry
  • Saccharomyces cerevisiae Proteins / genetics
  • Sequence Alignment / methods*

Substances

  • Escherichia coli Proteins
  • Proteome
  • Saccharomyces cerevisiae Proteins