A binary search approach to whole-genome data analysis

Leonid Brodsky; Simon Kogan; Eshel Benjacob; Eviatar Nevo

doi:10.1073/pnas.1011134107

A binary search approach to whole-genome data analysis

Proc Natl Acad Sci U S A. 2010 Sep 28;107(39):16893-8. doi: 10.1073/pnas.1011134107. Epub 2010 Sep 10.

Authors

Leonid Brodsky¹, Simon Kogan, Eshel Benjacob, Eviatar Nevo

Affiliation

¹ Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel. lbrodsky@research.haifa.ac.il

Abstract

A sequence analysis-oriented binary search-like algorithm was transformed to a sensitive and accurate analysis tool for processing whole-genome data. The advantage of the algorithm over previous methods is its ability to detect the margins of both short and long genome fragments, enriched by up-regulated signals, at equal accuracy. The score of an enriched genome fragment reflects the difference between the actual concentration of up-regulated signals in the fragment and the chromosome signal baseline. The "divide-and-conquer"-type algorithm detects a series of nonintersecting fragments of various lengths with locally optimal scores. The procedure is applied to detected fragments in a nested manner by recalculating the lower-than-baseline signals in the chromosome. The algorithm was applied to simulated whole-genome data, and its sensitivity/specificity were compared with those of several alternative algorithms. The algorithm was also tested with four biological tiling array datasets comprising Arabidopsis (i) expression and (ii) histone 3 lysine 27 trimethylation CHIP-on-chip datasets; Saccharomyces cerevisiae (iii) spliced intron data and (iv) chromatin remodeling factor binding sites. The analyses' results demonstrate the power of the algorithm in identifying both the short up-regulated fragments (such as exons and transcription factor binding sites) and the long--even moderately up-regulated zones--at their precise genome margins. The algorithm generates an accurate whole-genome landscape that could be used for cross-comparison of signals across the same genome in evolutionary and general genomic studies.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Arabidopsis / genetics
Chromosome Mapping / statistics & numerical data
Gene Expression Profiling / statistics & numerical data
Genome-Wide Association Study / statistics & numerical data*
Introns
Oligonucleotide Array Sequence Analysis / statistics & numerical data
RNA Splicing
Saccharomyces cerevisiae / genetics
Sequence Analysis, DNA / methods*