Assessment of length distributions between non-coding and coding sequences amongst two model organisms

Int J Data Min Bioinform. 2010;4(5):535-52. doi: 10.1504/ijdmb.2010.035899.

Abstract

The availability of genomic DNA and cDNA sequence data has escalated the data mining and genomics era. We aim to investigate the length distributions of the non-coding and coding regions of protein genes of two model organisms, Arabidopsis thaliana and Drosophila melanogaster. A non-linear functional relationship model was applied and strong correlation was found between the Coding Sequence (CDS) and non-coding sequence regions, conditional on the 5' UTR data. Significant differences were found between the protein functional classes and each gene region. Examination of the non-coding and coding regions of these organisms has revealed possible correlations.

MeSH terms

  • 5' Untranslated Regions
  • Animals
  • Arabidopsis / genetics*
  • DNA, Complementary / chemistry
  • Databases, Genetic
  • Drosophila melanogaster / genetics*
  • Genomics / methods*
  • Sequence Analysis, DNA / methods

Substances

  • 5' Untranslated Regions
  • DNA, Complementary