Uncovering the Relationship between Tissue-Specific TF-DNA Binding and Chromatin Features through a Transformer-Based Model

Genes (Basel). 2022 Oct 26;13(11):1952. doi: 10.3390/genes13111952.

Abstract

Chromatin features can reveal tissue-specific TF-DNA binding, which leads to a better understanding of many critical physiological processes. Accurately identifying TF-DNA bindings and constructing their relationships with chromatin features is a long-standing goal in the bioinformatic field. However, this has remained elusive due to the complex binding mechanisms and heterogeneity among inputs. Here, we have developed the GHTNet (General Hybrid Transformer Network), a transformer-based model to predict TF-DNA binding specificity. The GHTNet decodes the relationship between tissue-specific TF-DNA binding and chromatin features via a specific input scheme of alternative inputs and reveals important gene regions and tissue-specific motifs. Our experiments show that the GHTNet has excellent performance, achieving about a 5% absolute improvement over existing methods. The TF-DNA binding mechanism analysis shows that the importance of TF-DNA binding features varies across tissues. The best predictor is based on the DNA sequence, followed by epigenomics and shape. In addition, cross-species studies address the limited data, thus providing new ideas in this case. Moreover, the GHTNet is applied to interpret the relationship among TFs, chromatin features, and diseases associated with AD46 tissue. This paper demonstrates that the GHTNet is an accurate and robust framework for deciphering tissue-specific TF-DNA binding and interpreting non-coding regions.

Keywords: TF-DNA binding; chromatin features; deep learning; tissue specific.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites / genetics
  • Chromatin* / genetics
  • DNA / genetics
  • DNA / metabolism
  • Protein Binding
  • Transcription Factors* / genetics

Substances

  • Chromatin
  • Transcription Factors
  • DNA

Grants and funding

This work was supported by the National Natural Science Foundation of China under Grant No. 62272067; the Scientific Research Foundation of Sichuan Province under Grant No. 2022001; and the 2011 Collaborative Innovation Center for Image and Geospatial Information of Sichuan Province.