New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing

Brief Bioinform. 2014 May;15(3):343-53. doi: 10.1093/bib/bbt067. Epub 2013 Sep 23.

Abstract

With the development of next-generation sequencing (NGS) technologies, a large amount of short read data has been generated. Assembly of these short reads can be challenging for genomes and metagenomes without template sequences, making alignment-based genome sequence comparison difficult. In addition, sequence reads from NGS can come from different regions of various genomes and they may not be alignable. Sequence signature-based methods for genome comparison based on the frequencies of word patterns in genomes and metagenomes can potentially be useful for the analysis of short reads data from NGS. Here we review the recent development of alignment-free genome and metagenome comparison based on the frequencies of word patterns with emphasis on the dissimilarity measures between sequences, the statistical power of these measures when two sequences are related and the applications of these measures to NGS data.

Keywords: Markov model; NGS data; alignment-free; genome comparison; statistical power; word patterns.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Review

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Computational Biology / trends
  • Genomics / methods
  • Genomics / statistics & numerical data
  • High-Throughput Nucleotide Sequencing
  • Markov Chains
  • Models, Statistical
  • Sequence Alignment
  • Sequence Analysis / methods*
  • Sequence Analysis / statistics & numerical data