Streamlining universal single-copy orthologue and ultraconserved element design: A case study in Collembola

Mol Ecol Resour. 2020 May;20(3). doi: 10.1111/1755-0998.13146. Epub 2020 Mar 4.

Abstract

Genomic data sets are increasingly central to ecological and evolutionary biology, but far fewer resources are available for invertebrates. Powerful new computational tools and the rapidly decreasing cost of Illumina sequencing are beginning to change this, enabling rapid genome assembly and reference marker extraction. We have developed and tested a practical workflow for developing genomic resources in nonmodel groups with real-world data on Collembola (springtails), one of the most dominant soil animals on Earth. We designed universal molecular marker sets, single-copy orthologues (BUSCOs) and ultraconserved elements (UCEs), using three existing and 11 newly generated genomes. Both marker types were tested in silico via marker capture success and phylogenetic performance. The new genomes were assembled with Illumina short reads and 9,585-14,743 protein-coding genes were predicted with ab initio and protein homology evidence. We identified 1,997 benchmarking universal single-copy orthologues (BUSCOs) across 14 genomes and created and assessed a custom BUSCO data set for extracting single-copy genes. We also developed a new UCE probe set containing 46,087 baits targeting 1,885 loci. We successfully captured 1,437-1,865 BUSCOs and 975-1,186 UCEs across 14 genomes. Phylogenomic reconstructions using these markers proved robust, giving new insight on deep-time collembolan relationships. Our study demonstrates the feasibility of generating thousands of universal markers from highly efficient whole-genome sequencing, providing a valuable resource for genome-scale investigations in evolutionary biology and ecology.

Keywords: benchmarking universal single-copy orthologue; next-generation sequencing; phylogenomics; ultraconserved element; universal markers; whole-genome sequencing.

MeSH terms

  • Animals
  • Arthropods / genetics*
  • Biological Evolution
  • Conserved Sequence / genetics*
  • Genetic Loci / genetics
  • Genome / genetics
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing / methods
  • Phylogeny
  • Sequence Analysis, DNA / methods
  • Whole Genome Sequencing / methods