An incremental algorithm for Z-value computations

J C Aude; A Louis

doi:10.1016/s0097-8485(02)00003-7

An incremental algorithm for Z-value computations

Comput Chem. 2002 Jul;26(5):403-11. doi: 10.1016/s0097-8485(02)00003-7.

Authors

J C Aude¹, A Louis

Affiliation

¹ CEA/DSV/DBJC, CEA Saclay, Gif-sur-Yvette, France. jean-christophe.aude@cea.fr

PMID: 12144171
DOI: 10.1016/s0097-8485(02)00003-7

Abstract

The Z-value (Comput. Chem. 23 (1999) 333) is an extension of the Z-score that is classically used to compare sets of biological sequences. The Z-value has been successfully used to handle complete genome studies as well as analyze large sets of proteins. The Z-value computation is based on a Monte Carlo approach to estimate the statistical significance of a Smith & Waterman alignment score. Comet et al. (Comput. Chem. 23 (1999) 333) have shown that, in contrast to the alignment score, the Z-value largely reduces the bias due to the lengths and compositions of the sequences. They also described an estimator of the deviation of Z-values, that we extend in this paper in order to optimize Z-values computation. The incremental algorithm described here provides two characteristics which are usually incompatible: (i) it improves the accuracy of Z-values calculation; (ii) it reduces the time complexity (this algorithm has been named incremental because it iteratively adds random sequences to the Monte-Carlo process when needed). Results are presented, originating from the all-by-all comparison of the proteins from Saccharomyces cerevisiae and Escherichia coli.

MeSH terms

Algorithms*
Computational Biology / methods*
Escherichia coli Proteins / chemistry
Escherichia coli Proteins / genetics
Genome, Bacterial
Genome, Fungal
Monte Carlo Method
Proteome
Saccharomyces cerevisiae Proteins / chemistry
Saccharomyces cerevisiae Proteins / genetics
Sequence Alignment / methods*

Substances

Escherichia coli Proteins
Proteome
Saccharomyces cerevisiae Proteins