Incomplete Lineage Sorting and Hybridization Statistics for Large-Scale Retroposon Insertion Data

PLoS Comput Biol. 2016 Mar 11;12(3):e1004812. doi: 10.1371/journal.pcbi.1004812. eCollection 2016 Mar.

Abstract

Ancient retroposon insertions can be used as virtually homoplasy-free markers to reconstruct the phylogenetic history of species. Inherited, orthologous insertions in related species offer reliable signals of a common origin of the given species. One prerequisite for such a phylogenetically informative insertion is that the inserted element was fixed in the ancestral population before speciation; if not, polymorphically inserted elements may lead to random distributions of presence/absence states during speciation and possibly to apparently conflicting reconstructions of their ancestry. Fortunately, such misleading fixed cases are relatively rare but nevertheless, need to be considered. Here, we present novel, comprehensive statistical models applicable for (1) analyzing any pattern of rare genomic changes, (2) testing and differentiating conflicting phylogenetic reconstructions based on rare genomic changes caused by incomplete lineage sorting or/and ancestral hybridization, and (3) differentiating between search strategies involving genome information from one or several lineages. When the new statistics are applied, in non-conflicting cases a minimum of three elements present in both of two species and absent in a third group are considered significant support (p<0.05) for the branching of the third from the other two, if all three of the given species are screened equally for genome or experimental data. Five elements are necessary for significant support (p<0.05) if a diagnostic locus derived from only one of three species is screened, and no conflicting markers are detected. Most potentially conflicting patterns can be evaluated for their significance and ancestral hybridization can be distinguished from incomplete lineage sorting by considering symmetric or asymmetric distribution of rare genomic changes among possible tree configurations. Additionally, we provide an R-application to make the new KKSC insertion significance test available for the scientific community at http://retrogenomics.uni-muenster.de:3838/KKSC_significance_test/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence
  • Computer Simulation
  • DNA Transposable Elements / genetics*
  • High-Throughput Nucleotide Sequencing / methods
  • Hybridization, Genetic / genetics*
  • Models, Genetic*
  • Models, Statistical*
  • Molecular Sequence Data
  • Mutagenesis, Insertional / genetics*
  • Retroelements / genetics*
  • Software

Substances

  • DNA Transposable Elements
  • Retroelements

Grants and funding

This study was financially supported by the Deutsche Forschungsgemeinschaft (SCHM1469/3-2; SCHM1469/4-1; SCHM1469/5-1; KR3639/1-1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.