Gene essentiality prediction based on fractal features and machine learning

Mol Biosyst. 2017 Feb 28;13(3):577-584. doi: 10.1039/c6mb00806b.

Abstract

Essential genes are required for the viability of an organism. Accurate and rapid identification of new essential genes is of substantial theoretical interest to synthetic biology and has practical applications in biomedicine. Fractals provide facilitated access to genetic structure analysis on a different scale. In this study, machine learning-based methods using solely fractal features are presented and the problem of predicting essential genes in bacterial genomes is evaluated. Six fractal features were investigated to learn the parameters of five supervised classification methods for the binary classification task. The optimal parameters of these classifiers are determined via grid-based searching technique. All the currently available identified genes from the database of essential genes were utilized to build the classifiers. The fractal features were proven to be more robust and powerful in the prediction performance. In a statistical sense, the ELM method shows superiority in predicting the essential genes. Non-parameter tests of the average AUC and ACC showed that the fractal feature is much better than other five compared features sets. Our approach is promising and convenient to identify new bacterial essential genes.

MeSH terms

  • Bacteria / genetics
  • Databases, Nucleic Acid
  • Fractals*
  • Genes, Bacterial
  • Genes, Essential*
  • Genomics / methods
  • Humans
  • Machine Learning*
  • Nonlinear Dynamics
  • ROC Curve