Multinomial Convolutions for Joint Modeling of Regulatory Motifs and Sequence Activity Readouts

Genes (Basel). 2022 Sep 8;13(9):1614. doi: 10.3390/genes13091614.

Abstract

A common goal in the convolutional neural network (CNN) modeling of genomic data is to discover specific sequence motifs. Post hoc analysis methods aid in this task but are dependent on parameters whose optimal values are unclear and applying the discovered motifs to new genomic data is not straightforward. As an alternative, we propose to learn convolutions as multinomial distributions, thus streamlining interpretable motif discovery with CNN model fitting. We developed MuSeAM (Multinomial CNNs for Sequence Activity Modeling) by implementing multinomial convolutions in a CNN model. Through benchmarking, we demonstrate the efficacy of MuSeAM in accurately modeling genomic data while fitting multinomial convolutions that recapitulate known transcription factor motifs.

Keywords: MPRA; convolutional neural networks; motifs; multinomial convolutional neural networks; multinomial convolutions.

MeSH terms

  • Genomics*
  • Neural Networks, Computer*
  • Transcription Factors / genetics

Substances

  • Transcription Factors

Grants and funding

This research received no external funding.