Genome-Wide Study of Colocalization between Genomic Stretches: A Method and Applications to the Regulation of Gene Expression

Biology (Basel). 2022 Sep 29;11(10):1422. doi: 10.3390/biology11101422.

Abstract

In this paper, we describe a method for the study of colocalization effects between stretch-stretch and stretch-point genome tracks based on a set of indices varying within the (-1, +1) interval. The indices combine the distances between the centers of neighboring stretches and their lengths. The extreme boundaries of the interval correspond to the complete colocalization of the genome tracks or its complete absence. We also obtained the relevant criteria of statistical significance for such indices using the complete permutation test. The method is robust with respect to strongly inhomogeneous positioning and length distribution of the genome tracks. On the basis of this approach, we created command-line software, the Genome Track Colocalization Analyzer. The software was tested, compared with other available packages, and applied to particular problems related to gene expression. The package, Genome Track Colocalization Analyzer (GTCA), is freely available to the users. GTCA complements our previous software, the Genome Track Analyzer, intended for the search for pairwise correlations between point-like genome tracks (also freely available). The corresponding details are provided in Data Availability Statement at the end of the text.

Keywords: CpG islands (CGI); GWAS; bioinformatic tool; biostatistics; epigenetics; genome tracks; histone mark H2A.Z; stretches; transcription start site (TSS).