Improving Enhancer Identification with a Multi-Classifier Stacked Ensemble Model

J Mol Biol. 2023 Dec 1;435(23):168314. doi: 10.1016/j.jmb.2023.168314. Epub 2023 Oct 16.

Abstract

Enhancers are DNA regions that are responsible for controlling the expression of genes. Enhancers are usually found upstream or downstream of a gene, or even inside a gene's intron region, but are normally located at a distant location from the genes they control. By integrating experimental and computational approaches, it is possible to uncover enhancers within DNA sequences, which possess regulatory properties. Experimental techniques such as ChIP-seq and ATAC-seq can identify genomic regions that are associated with transcription factors or accessible to regulatory proteins. On the other hand, computational techniques can predict enhancers based on sequence features and epigenetic modifications. In our study, we have developed a multi-classifier stacked ensemble (MCSE-enhancer) model that can accurately identify enhancers. We utilized feature descriptors from various physiochemical properties as input for our six baseline classifiers and built a stacked classifier, which outperformed previous enhancer classification techniques in terms of accuracy, specificity, sensitivity, and Mathew's correlation coefficient. Our model achieved an accuracy of 81.5%, representing a 2-3% improvement over existing models.

Keywords: DNA sequences; bioinformatics; computational biology; enhancers; meta classification.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology* / methods
  • DNA / chemistry
  • DNA / genetics
  • Enhancer Elements, Genetic*
  • Machine Learning*
  • Sequence Analysis, DNA* / methods
  • Transcription Factors / chemistry

Substances

  • DNA
  • Transcription Factors