Frequency spectra characterization of noncoding human genomic sequences

Genes Genomics. 2020 Oct;42(10):1215-1226. doi: 10.1007/s13258-020-00980-2. Epub 2020 Aug 31.

Abstract

Background: Noncoding sequences have been demonstrated to possess regulatory functions. Its classification is challenging because they do not show well-defined nucleotide patterns that can correlate with their biological functions. Genomic signal processing techniques like Fourier transform have been employed to characterize coding and noncoding sequences. This transformation in a systematic whole-genome noncoding library, such as the ENCODE database, can provide evidence of a periodic behaviour in the noncoding sequences that correlates with their regulatory functions.

Objective: The objective of this study was to classify different noncoding regulatory regions through their frequency spectra.

Methods: We computed machine learning algorithms to classify the noncoding regulatory sequences frequency spectra.

Results: The sequences from different regulatory regions, cell lines, and chromosomes possessed distinct frequency spectra, and that machine learning classifiers (such as those of the support vector machine type) could successfully discriminate among regulatory regions, thus correlating the frequency spectra with their biological functions CONCLUSION: Our work supports the idea that there are patterns in the noncoding sequences of the genome.

Keywords: ENCODE; Genomic signal processing; Human genome; Noncoding sequence Fourier analysis; Spectral classification.

MeSH terms

  • Algorithms
  • Genome, Human / genetics*
  • Genomics*
  • Humans
  • Machine Learning*
  • Nucleotides / genetics
  • Regulatory Sequences, Nucleic Acid / genetics*

Substances

  • Nucleotides