Estimating the number and assignment of clock models in analyses of multigene datasets

Bioinformatics. 2016 May 1;32(9):1281-5. doi: 10.1093/bioinformatics/btw005. Epub 2016 Jan 6.

Abstract

Motivation: Molecular-clock methods can be used to estimate evolutionary rates and timescales from DNA sequence data. However, different genes can display different patterns of rate variation across lineages, calling for the employment of multiple clock models. Selecting the optimal clock-partitioning scheme for a multigene dataset can be computationally demanding, but clustering methods provide a feasible alternative. We investigated the performance of different clustering methods using data from chloroplast genomes and data generated by simulation.

Results: Our results show that mixture models provide a useful alternative to traditional partitioning algorithms. We found only a small number of distinct patterns of among-lineage rate variation among chloroplast genes, which were consistent across taxonomic scales. This suggests that the evolution of chloroplast genes has been governed by a small number of genomic pacemakers. Our study also demonstrates that clustering methods provide an efficient means of identifying clock-partitioning schemes for genome-scale datasets.

Availability and implementation: The code and data sets used in this study are available online at https://github.com/sebastianduchene/pacemaker_clustering_methods

Contact: sebastian.duchene@sydney.edu.au

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms
  • Animals
  • Cluster Analysis
  • Genome
  • Genomics*
  • Humans
  • Models, Genetic*