On DNA numerical representations for genomic similarity computation

PLoS One. 2017 Mar 21;12(3):e0173288. doi: 10.1371/journal.pone.0173288. eCollection 2017.

Abstract

Genomic signal processing (GSP) refers to the use of signal processing for the analysis of genomic data. GSP methods require the transformation or mapping of the genomic data to a numeric representation. To date, several DNA numeric representations (DNR) have been proposed; however, it is not clear what the properties of each DNR are and how the selection of one will affect the results when using a signal processing technique to analyze them. In this paper, we present an experimental study of the characteristics of nine of the most frequently-used DNR. The objective of this paper is to evaluate the behavior of each representation when used to measure the similarity of a given pair of DNA sequences.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms
  • Animals
  • Computer Simulation
  • Cyclooxygenase 1 / genetics
  • Databases, Genetic
  • Humans
  • Ribosomal Proteins / genetics
  • Sequence Analysis, DNA / methods*
  • Sequence Homology
  • Signal Processing, Computer-Assisted*

Substances

  • Ribosomal Proteins
  • ribosomal protein S18
  • Cyclooxygenase 1

Grants and funding

The authors thank the support received by Consejo Nacional de Ciencia y Tecnología and Programa para el Desarrollo Profesional Docente. Any opinions, findings, conclusions or recommendations expressed in this material are the authors’ and may not reflect the views of the sponsors.