Interpretation of deep learning in genomics and epigenomics

Brief Bioinform. 2021 May 20;22(3):bbaa177. doi: 10.1093/bib/bbaa177.

Abstract

Machine learning methods have been widely applied to big data analysis in genomics and epigenomics research. Although accuracy and efficiency are common goals in many modeling tasks, model interpretability is especially important to these studies towards understanding the underlying molecular and cellular mechanisms. Deep neural networks (DNNs) have recently gained popularity in various types of genomic and epigenomic studies due to their capabilities in utilizing large-scale high-throughput bioinformatics data and achieving high accuracy in predictions and classifications. However, DNNs are often challenged by their potential to explain the predictions due to their black-box nature. In this review, we present current development in the model interpretation of DNNs, focusing on their applications in genomics and epigenomics. We first describe state-of-the-art DNN interpretation methods in representative machine learning fields. We then summarize the DNN interpretation methods in recent studies on genomics and epigenomics, focusing on current data- and computing-intensive topics such as sequence motif identification, genetic variations, gene expression, chromatin interactions and non-coding RNAs. We also present the biological discoveries that resulted from these interpretation methods. We finally discuss the advantages and limitations of current interpretation approaches in the context of genomic and epigenomic studies. Contact:xiaoman@mail.ucf.edu, haihu@cs.ucf.edu.

Keywords: deep neural network; epigenomics; feature interpretation; genomics; model interpretation.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Review

MeSH terms

  • Chromatin / metabolism
  • Computational Biology / methods
  • DNA / genetics
  • Deep Learning*
  • Epigenesis, Genetic*
  • Gene Expression
  • Genomics*
  • Neural Networks, Computer*
  • Protein Binding
  • RNA / genetics

Substances

  • Chromatin
  • RNA
  • DNA