The first chromosome-level genome assembly of Entomobrya proxima Folsom, 1924 (Collembola: Entomobryidae)

Sci Data. 2023 Aug 16;10(1):541. doi: 10.1038/s41597-023-02456-w.

Abstract

The Entomobryoidea, the largest superfamily of Collembola, encompasses over 2,000 species in the world. However, the lack of high-quality genomes hinders our understanding of the evolution and ecology of this group. This study presents a chromosome-level genome of Entomobrya proxima by combining PacBio long reads, Illumina short reads, and Hi-C data. The genome has a size of 362.37 Mb, with a scaffold N50 size of 57.67 Mb, and 97.12% (351.95 Mb) of the assembly is located on six chromosomes. The BUSCO analysis of our assembly indicates a completeness of 96.1% (n = 1,013), including 946 (93.4%) single-copy BUSCOs and 27 (2.7%) duplicated BUSCOs. We identified that the genome contains 22.16% (80.06 Mb) repeat elements and 20,988 predicted protein-coding genes. Gene family evolution analysis of E. proxima identified 177 gene families that underwent significant expansions, which were primarily associated with detoxification and metabolism. Moreover, our inter-genomic synteny analysis showed strong chromosomal synteny between E. proxima and Sinella curviseta. Our study provides valuable genomic information for comprehending the evolution and ecology of Collembola.

Publication types

  • Dataset
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Arthropods* / genetics
  • Ecology
  • Genome*
  • Genomics