TAD boundary and strength prediction by integrating sequence and epigenetic profile information

Brief Bioinform. 2021 Sep 2;22(5):bbab139. doi: 10.1093/bib/bbab139.

Abstract

Topologically associated domains (TADs) are one of the important higher order chromatin structures with various sizes in the eukaryotic genomes. TAD boundaries, as the flanking regions between adjacent domains, can restrict the interactions of regulatory elements, including enhancers and promoters, and are generally dynamic and variable in different cells. However, the influence of sequence and epigenetic profile-based features in the identification of TAD boundaries is largely unknown. In this work, we proposed a method called pTADS (prediction of TAD boundary and strength), to predict TAD boundaries and boundary strength across multiple cell lines with DNA sequence and epigenetic profile information. The performance was assessed in seven cell lines and three TAD calling methods. The results demonstrate that the TAD boundary can be well predicted by the selected shared features across multiple cell lines. Especially, the model can be transferable to predict the TAD boundary from one cell line to other cell lines. The boundary strength can be characterized by boundary score with good performance. The predicted TAD boundary and TAD boundary strength are further confirmed by three Hi-C contact matrix-based methods across multiple cell lines. The codes and datasets are available at https://github.com/chrom3DEpi/pTADS.

Keywords: TAD boundary; boundary score; boundary strength; epigenetic profile; machine learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Cell Line
  • Chromatin / genetics*
  • Chromatin / metabolism
  • Computational Biology / methods*
  • DNA / genetics
  • DNA / metabolism
  • Enhancer Elements, Genetic / genetics
  • Epigenesis, Genetic*
  • Epigenomics / methods*
  • Genome, Human / genetics
  • Humans
  • K562 Cells
  • Promoter Regions, Genetic / genetics
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Reproducibility of Results

Substances

  • Chromatin
  • DNA