Modelling complex features from histone modification signatures using genetic algorithm for the prediction of enhancer region

Biomed Mater Eng. 2014;24(6):3807-14. doi: 10.3233/BME-141210.

Abstract

Using Genetic Algorithm, this paper presents a modelling method to generate novel logical-based features from DNA sequences enriched with H3K4mel histone signatures. Current histone signature is mostly represented using k-mers content features incapable of representing all the possible complex interactions of various DNA segments. The main contributions are, among others: (a) demonstrating that there are complex interactions among sequence segments in the histone regions; (b) developing a parse tree representation of the logical complex features. The proposed novel feature is compared to the k-mers content features using datasets from the mouse (mm9) genome. Evaluation results show that the new feature improves the prediction performance as shown by f-measure for all datasets tested. Also, it is discovered that tree-based features generated from a single chromosome can be generalized to predict histone marks in other chromosomes not used in the training. These findings have a great impact on feature design considerations for histone signatures as well as other classifier design features.

Keywords: Genetic algorithm; histone feature; tree-based feature.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Base Sequence
  • Chromosome Mapping / methods*
  • Computer Simulation
  • Enhancer Elements, Genetic / genetics*
  • Epigenesis, Genetic / genetics*
  • Histones / genetics*
  • Mice
  • Models, Genetic*
  • Molecular Sequence Data
  • Pattern Recognition, Automated / methods
  • Sequence Analysis, DNA / methods*

Substances

  • Histones