Integrated entropy-based approach for analyzing exons and introns in DNA sequences

BMC Bioinformatics. 2019 Jun 10;20(Suppl 8):283. doi: 10.1186/s12859-019-2772-y.

Abstract

Background: Numerous essential algorithms and methods, including entropy-based quantitative methods, have been developed to analyze complex DNA sequences since the last decade. Exons and introns are the most notable components of DNA and their identification and prediction are always the focus of state-of-the-art research.

Results: In this study, we designed an integrated entropy-based analysis approach, which involves modified topological entropy calculation, genomic signal processing (GSP) method and singular value decomposition (SVD), to investigate exons and introns in DNA sequences. We optimized and implemented the topological entropy and the generalized topological entropy to calculate the complexity of DNA sequences, highlighting the characteristics of repetition sequences. By comparing digitalizing entropy values of exons and introns, we observed that they are significantly different. After we converted DNA data to numerical topological entropy value, we applied SVD method to effectively investigate exon and intron regions on a single gene sequence. Additionally, several genes across five species are used for exon predictions.

Conclusions: Our approach not only helps to explore the complexity of DNA sequence and its functional elements, but also provides an entropy-based GSP method to analyze exon and intron regions. Our work is feasible across different species and extendable to analyze other components in both coding and noncoding region of DNA sequences.

Keywords: DNA sequences; Exon and intron prediction; Generalized topological entropy; Genomic signal processing; Information entropy.

MeSH terms

  • Algorithms
  • Base Sequence
  • Chromosomes, Human / genetics
  • DNA / genetics
  • Entropy*
  • Exons / genetics*
  • Genome, Human
  • Humans
  • Introns / genetics*
  • Promoter Regions, Genetic / genetics
  • ROC Curve
  • Sequence Analysis, DNA / methods
  • Signal Processing, Computer-Assisted

Substances

  • DNA