Computational methods of identification of pseudogenes based on functionality: entropy and GC content

Methods Mol Biol. 2014:1167:41-62. doi: 10.1007/978-1-4939-0835-6_4.

Abstract

Spectral entropy and GC content analyses reveal comprehensive structural features of DNA sequences. To illustrate the significance of these features, we analyze the β-esterase gene cluster, including the Est-6 gene and the ψEst-6 putative pseudogene, in seven species of the Drosophila melanogaster subgroup. The spectral entropies show distinctly lower structural ordering for ψEst-6 than for Est-6 in all species studied. However, entropy accumulation is not a completely random process for either gene and it shows to be nucleotide dependent. Furthermore, GC content in synonymous positions is uniformly higher in Est-6 than in ψEst-6, in agreement with the reduced GC content generally observed in pseudogenes and nonfunctional sequences. The observed differences in entropy and GC content reflect an evolutionary shift associated with the process of pseudogenization and subsequent functional divergence of ψEst-6 and Est-6 after the duplication event. The data obtained show the relevance and significance of entropy and GC content analyses for pseudogene identification and for the comparative study of gene-pseudogene evolution.

MeSH terms

  • Animals
  • Base Composition
  • Codon
  • Computational Biology / methods*
  • Drosophila melanogaster / genetics
  • Entropy
  • Genomics / methods*
  • Multigene Family
  • Pseudogenes / genetics*

Substances

  • Codon