CFA: An explainable deep learning model for annotating the transcriptional roles of cis-regulatory modules based on epigenetic codes

Comput Biol Med. 2023 Jan:152:106375. doi: 10.1016/j.compbiomed.2022.106375. Epub 2022 Nov 29.

Abstract

Metazoa gene expression is controlled by modular DNA segments called cis-regulatory modules (CRMs). CRMs can convey promoter/enhancer/insulator roles, generating additional regulation layers in transcription. Experiments for understanding CRM roles are low-throughput and costly. Large-scale CRM function investigation still depends on computational methods. However, existing in silico tools only recognize enhancers or promoters exclusively, thus accumulating errors when considering CRM promoter/enhancer/insulator roles altogether. Currently, no algorithm can concurrently consider these CRM roles. In this research, we developed the CRM Function Annotator (CFA) model. CFA provides complete CRM transcriptional role labeling based on epigenetic profiling interpretation. We demonstrated that CFA achieves high performance (test macro auROC/auPRC = 94.1%/90.3%) and outperforms existing tools in promoter/enhancer/insulator identification. CFA is also inspected to recognize explainable epigenetic codes consistent with previous findings when labeling CRM roles. By considering the higher-order combinations of the epigenetic codes, CFA significantly reduces false-positive rates in CRM transcriptional role annotation. CFA is available at https://github.com/cobisLab/CFA/.

Keywords: Epigenetic code; Transcriptional regulation; cis-regulatory module function.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Deep Learning*
  • Epigenesis, Genetic / genetics
  • Promoter Regions, Genetic / genetics