Genomic signal processing for DNA sequence clustering

PeerJ. 2018 Jan 24:6:e4264. doi: 10.7717/peerj.4264. eCollection 2018.

Abstract

Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.

Keywords: COX1; DNA; Genomic signal processing; K-means; Sequence clustering.

Grants and funding

The authors received no funding for this work.