Cross-species enhancer prediction using machine learning

Genomics. 2022 Sep;114(5):110454. doi: 10.1016/j.ygeno.2022.110454. Epub 2022 Aug 25.

Abstract

Cis-regulatory elements (CREs) are non-coding parts of the genome that play a critical role in gene expression regulation. Enhancers, as an important example of CREs, interact with genes to influence complex traits like disease, heat tolerance and growth rate. Much of what is known about enhancers come from studies of humans and a few model organisms like mouse, with little known about other mammalian species. Previous studies have attempted to identify enhancers in less studied mammals using comparative genomics but with limited success. Recently, Machine Learning (ML) techniques have shown promising results to predict enhancer regions. Here, we investigated the ability of ML methods to identify enhancers in three non-model mammalian species (cattle, pig and dog) using human and mouse enhancer data from VISTA and publicly available ChIP-seq. We tested nine models, using four different representations of the DNA sequences in cross-species prediction using both the VISTA dataset and species-specific ChIP-seq data. We identified between 809,399 and 877,278 enhancer-like regions (ELRs) in the study species (11.6-13.7% of each genome). These predictions were close to the ~8% proportion of ELRs that covered the human genome. We propose that our ML methods have predictive ability for identifying enhancers in non-model mammalian species. We have provided a list of high confidence enhancers at https://github.com/DaviesCentreInformatics/Cross-species-enhancer-prediction and believe these enhancers will be of great use to the community.

Keywords: ChIP-seq; Cross-species enhancer prediction; Deep learning; Livestock; Machine learning.

MeSH terms

  • Animals
  • Base Sequence
  • Cattle
  • Dogs
  • Enhancer Elements, Genetic*
  • Genome, Human
  • Genomics* / methods
  • Humans
  • Machine Learning
  • Mammals / genetics
  • Mice
  • Swine