Motivation: Molecular-clock methods can be used to estimate evolutionary rates and timescales from DNA sequence data. However, different genes can display different patterns of rate variation across lineages, calling for the employment of multiple clock models. Selecting the optimal clock-partitioning scheme for a multigene dataset can be computationally demanding, but clustering methods provide a feasible alternative. We investigated the performance of different clustering methods using data from chloroplast genomes and data generated by simulation.
Results: Our results show that mixture models provide a useful alternative to traditional partitioning algorithms. We found only a small number of distinct patterns of among-lineage rate variation among chloroplast genes, which were consistent across taxonomic scales. This suggests that the evolution of chloroplast genes has been governed by a small number of genomic pacemakers. Our study also demonstrates that clustering methods provide an efficient means of identifying clock-partitioning schemes for genome-scale datasets.
Availability and implementation: The code and data sets used in this study are available online at https://github.com/sebastianduchene/pacemaker_clustering_methods
Contact: sebastian.duchene@sydney.edu.au
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.