Chromosomal-level assembly of the blood clam, Scapharca (Anadara) broughtonii, using long sequence reads and Hi-C

Chang-Ming Bai; Lu-Sheng Xin; Umberto Rosani; Biao Wu; Qing-Chen Wang; Xiao-Ke Duan; Zhi-Hong Liu; Chong-Ming Wang

doi:10.1093/gigascience/giz067

Chromosomal-level assembly of the blood clam, Scapharca (Anadara) broughtonii, using long sequence reads and Hi-C

Gigascience. 2019 Jul 1;8(7):giz067. doi: 10.1093/gigascience/giz067.

Authors

Chang-Ming Bai¹, Lu-Sheng Xin¹, Umberto Rosani^{2

3}, Biao Wu¹, Qing-Chen Wang¹, Xiao-Ke Duan⁴, Zhi-Hong Liu¹, Chong-Ming Wang¹

Affiliations

¹ Key Laboratory of Maricultural Organism Disease Control, Ministry of Agriculture; Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology; Qingdao Key Laboratory of Mariculture Epidemiology and Biosecurity; Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, 106 Nanjing Road, Qingdao 266071, China.
² Department of Biology, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy.
³ Alfred Wegener Institute - Helmholtz Centre for Polar and Marine Research, Wadden Sea Station, Hafenstraße 43, List/Sylt 25992, Germany.
⁴ Biomarker Technologies Corporation, 12 Fuqian Street, Beijing 101200, China.

Abstract

Background: The blood clam, Scapharca (Anadara) broughtonii, is an economically and ecologically important marine bivalve of the family Arcidae. Efforts to study their population genetics, breeding, cultivation, and stock enrichment have been somewhat hindered by the lack of a reference genome. Herein, we report the complete genome sequence of S. broughtonii, a first reference genome of the family Arcidae.

Findings: A total of 75.79 Gb clean data were generated with the Pacific Biosciences and Oxford Nanopore platforms, which represented approximately 86× coverage of the S. broughtonii genome. De novo assembly of these long reads resulted in an 884.5-Mb genome, with a contig N50 of 1.80 Mb and scaffold N50 of 45.00 Mb. Genome Hi-C scaffolding resulted in 19 chromosomes containing 99.35% of bases in the assembled genome. Genome annotation revealed that nearly half of the genome (46.1%) is composed of repeated sequences, while 24,045 protein-coding genes were predicted and 84.7% of them were annotated.

Conclusions: We report here a chromosomal-level assembly of the S. broughtonii genome based on long-read sequencing and Hi-C scaffolding. The genomic data can serve as a reference for the family Arcidae and will provide a valuable resource for the scientific community and aquaculture sector.

Keywords: Hi-C; PacBio; ark shell; chromosomal assembly; genomic.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Animals
Bivalvia / genetics*
Chromosomes / genetics*
Contig Mapping
Genome*
Molecular Sequence Annotation
Whole Genome Sequencing