OLOGRAM: Determining significance of total overlap length between genomic regions sets

Bioinformatics. 2019 Nov 5:btz810. doi: 10.1093/bioinformatics/btz810. Online ahead of print.

Abstract

Motivation: Various bioinformatics analyses provide sets of genomic coordinates of interest. Whether two such sets possess a functional relation is a frequent question. This is often determined by interpreting the statistical significance of their overlaps. However, only few existing methods consider the lengths of the overlap, and they do not provide a resolutive p-value.

Results: Here, we introduce OLOGRAM, which performs overlap statistics between sets of genomic regions described in BEDs or GTF. It uses Monte Carlo simulation, taking into account both the distributions of region and inter-region lengths, to fit a negative binomial model of the total overlap length. Exclusion of user-defined genomic areas during the shuffling is supported.

Availability: This tool is available through the command line interface of the pygtftk toolkit. It has been tested on Linux and OSX and is available on Bioconda and from https://github.com/dputhier/pygtftk under the GNU GPL license.

Supplementary information: Supplementary data are available at Bioinformatics online.