Physicochemical property based computational scheme for classifying DNA sequence elements of Saccharomyces cerevisiae

Comput Biol Chem. 2019 Apr:79:193-201. doi: 10.1016/j.compbiolchem.2018.12.014. Epub 2018 Dec 26.

Abstract

GenerationE of huge "omics" data necessitates the development and application of computational methods to annotate the data in terms of biological features. In the context of DNA sequence, it is important to unravel the hidden physicochemical signatures. For this purpose, we have considered various sequence elements such as promoter, ACS, LTRs, telomere, and retrotransposon of the model organism Saccharomyces cerevisiae. Contributions due to di-nucleotides play a major role in studying the DNA conformation profile. The physicochemical parameters used are hydrogen bonding energy, stacking energy and solvation energy per base pair. Our computational study shows that all sequence elements in this study have distinctive physicochemical signatures and the same can be exploited for prediction experiments. The order that we see in a DNA sequence is dictated by biological regions and hence, there exists role of dependency in the sequence makeup, keeping this in mind we are proposing two computational schemes (a) using a windowing block size procedure and (b) using di-nucleotide transitions. We obtained better discriminating profile when we analyzed the sequence data in windowing manner. In the second novel approach, we introduced the di-nucleotide transition probability matrix (DTPM) to study the hidden layer of information embedded in the sequences. DTPM has been used as weights for scanning and predictions. This proposed computational scheme incorporates the memory property which is more realistic to study the physicochemical properties embedded in DNA sequences. Our analysis shows that the DTPM scheme performs better than the existing method in this applied region. Characterization of these elements will be a key to genome editing applications and advanced machine learning approaches may also require such distinctive profiles as useful input features.

Keywords: Computational scheme; DNA sequence elements; DTPM; Machine learning; Physicochemical properties.

MeSH terms

  • Chemistry, Physical
  • Computational Biology*
  • DNA / chemistry*
  • DNA / classification*
  • DNA / genetics
  • Hydrogen Bonding
  • Molecular Dynamics Simulation*
  • Saccharomyces cerevisiae / genetics*
  • Sequence Analysis, DNA

Substances

  • DNA