Similarity/dissimilarity calculation methods of DNA sequences: A survey

Xin Jin; Qian Jiang; Yanyan Chen; Shin-Jye Lee; Rencan Nie; Shaowen Yao; Dongming Zhou; Kangjian He

doi:10.1016/j.jmgm.2017.07.019

Similarity/dissimilarity calculation methods of DNA sequences: A survey

J Mol Graph Model. 2017 Sep:76:342-355. doi: 10.1016/j.jmgm.2017.07.019. Epub 2017 Jul 20.

Authors

Xin Jin¹, Qian Jiang¹, Yanyan Chen², Shin-Jye Lee³, Rencan Nie¹, Shaowen Yao⁴, Dongming Zhou⁵, Kangjian He¹

Affiliations

¹ School of Information, Yunnan University, Kunming, Yunnan Province, China.
² School of Life Sciences, Yunnan University, Kunming, Yunnan Province, China.
³ School of Software, Yunnan University, Kunming, Yunnan Province, China; Queens' College, University of Cambridge, Cambridge CB3 9ET, U.K.
⁴ School of Software, Yunnan University, Kunming, Yunnan Province, China. Electronic address: yaosw@ynu.edu.cn.
⁵ School of Information, Yunnan University, Kunming, Yunnan Province, China. Electronic address: zhoudm@ynu.edu.cn.

PMID: 28763687
DOI: 10.1016/j.jmgm.2017.07.019

Abstract

DNA sequence similarity/dissimilarity analysis is a fundamental task in computational biology, which is used to analyze the similarity of different DNA sequences for learning their evolutionary relationships. In past decades, a large number of similarity analysis methods for DNA sequence have been proposed due to the ever-growing demands. In order to learn the advances of DNA sequence similarity analysis, we make a survey and try to promote the development of this field. In this paper, we first introduce the related knowledge of DNA similarities analysis, including the data sets, similarities distance and output data. Then, we review recent algorithmic developments for DNA similarity analysis to represent a survey of the art in this field. At last, we summarize the corresponding tendencies and challenges in this research field. This survey concludes that although various DNA similarity analysis methods have been proposed, there still exist several further improvements or potential research directions in this field.

Keywords: DNA sequence analysis; Evolutionary relationship; Feature extraction; Graphical representation; Similarity analysis.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Animals
Base Composition
Base Sequence*
Computational Biology* / methods
DNA / chemistry*
Humans
Phylogeny
Reproducibility of Results
Sequence Homology, Nucleic Acid*

Substances

DNA