ezGeno: an automatic model selection package for genomic data analysis

Bioinformatics. 2021 Dec 22;38(1):30-37. doi: 10.1093/bioinformatics/btab588.

Abstract

Motivation: To facilitate the process of tailor-making a deep neural network for exploring the dynamics of genomic DNA, we have developed a hands-on package called ezGeno. ezGeno automates the search process of various parameters and network structures and can be applied to any kind of 1D genomic data. Combinations of multiple abovementioned 1D features are also applicable.

Results: For the task of predicting TF binding using genomic sequences as the input, ezGeno can consistently return the best performing set of parameters and network structure, as well as highlight the important segments within the original sequences. For the task of predicting tissue-specific enhancer activity using both sequence and DNase feature data as the input, ezGeno also regularly outperforms the hand-designed models. Furthermore, we demonstrate that ezGeno is superior in efficiency and accuracy compared to the one-layer DeepBind model and AutoKeras, an open-source AutoML package.

Availability and implementation: The ezGeno package can be freely accessed at https://github.com/ailabstw/ezGeno.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genome
  • Genomics*
  • Neural Networks, Computer
  • Protein Binding
  • Software*