Prediction of enhancer-promoter interactions using the cross-cell type information and domain adversarial neural network

BMC Bioinformatics. 2020 Nov 7;21(1):507. doi: 10.1186/s12859-020-03844-4.

Abstract

Background: Enhancer-promoter interactions (EPIs) play key roles in transcriptional regulation and disease progression. Although several computational methods have been developed to predict such interactions, their performances are not satisfactory when training and testing data from different cell lines. Currently, it is still unclear what extent a across cell line prediction can be made based on sequence-level information.

Results: In this work, we present a novel Sequence-based method (called SEPT) to predict the enhancer-promoter interactions in new cell line by using the cross-cell information and Transfer learning. SEPT first learns the features of enhancer and promoter from DNA sequences with convolutional neural network (CNN), then designing the gradient reversal layer of transfer learning to reduce the cell line specific features meanwhile retaining the features associated with EPIs. When the locations of enhancers and promoters are provided in new cell line, SEPT can successfully recognize EPIs in this new cell line based on labeled data of other cell lines. The experiment results show that SEPT can effectively learn the latent import EPIs-related features between cell lines and achieves the best prediction performance in terms of AUC (the area under the receiver operating curves).

Conclusions: SEPT is an effective method for predicting the EPIs in new cell line. Domain adversarial architecture of transfer learning used in SEPT can learn the latent EPIs shared features among cell lines from all other existing labeled data. It can be expected that SEPT will be of interest to researchers concerned with biological interaction prediction.

Keywords: Cell line; Convolutional neural network; Enhancer–promoter interactions; Gradient reversal layer; Transfer learning.

MeSH terms

  • Area Under Curve
  • Cell Line
  • Humans
  • Neural Networks, Computer*
  • Promoter Regions, Genetic / genetics*
  • ROC Curve
  • Regulatory Sequences, Nucleic Acid / genetics*