A Short Report on the Markov Property of DNA Sequences on 200-bp Genomic Units of ENCODE/Broad ChromHMM Annotations: A Computational Perspective

Genomics Inform. 2018 Sep;16(3):65-70. doi: 10.5808/GI.2018.16.3.65. Epub 2018 Sep 30.

Abstract

The non-coding DNA in eukaryotic genomes encodes a language which programs chromatin accessibility, transcription factor binding, and various other activities. The objective of this short report was to determine the impact of primary DNA sequence on the epigenomic landscape across 200-base pair genomic units by integrating nine publicly available ChromHMM Browser Extensible Data files of the Encyclopedia of DNA Elements (ENCODE) project. The nucleotide frequency profiles of nine chromatin annotations with the units of 200 bp were analyzed and integrative Markov chains were built to detect the Markov properties of the DNA sequences in some of the active chromatin states of different ChromHMM regions. Our aim was to identify the possible relationship between DNA sequences and the newly built chromatin states based on the integrated ChromHMM datasets of different cells and tissue types.

Keywords: Markov chain; chromatin maps; computational epigenetics; noncoding DNA; nucleotide frequency patterns.