Complete genome assembly data of paenibacillus sp. RUD330, a hypothetical symbiont of euglena gracilis

Data Brief. 2020 Jul 25:32:106070. doi: 10.1016/j.dib.2020.106070. eCollection 2020 Oct.

Abstract

An unknown bacterial strain was detected in the cytostome of Euglena gracilis and on the cell surface of Euglena gracilis using transmission electron microscopy. To identify the unknown bacterium and its function, we performed isolation experiments. Here we present the genome sequence of the isolate that was determined to be Paenibacillus sp. The genome of the bacterium was sequenced four times using Illumina technology with pair-end reads, Illumina technology with mate pair reads (inserts 3-4 and 6-8 Kb), and Nanopore technology with long reads (tens of thousands of nucleotides). Assemblies based on Illumina reads including mate-pair reads could not resolve issues caused by long tandem copies of rRNA, other tandem repeats, and extremely GC-rich regions (90-100%). Only long Nanopore reads resolved those gaps and made it possible to complete the entire genome; moreover, we found one plasmid. The length of the genome is 5.56 Mbp, and the average GC content is 59%. The genome of Paenibacillus sp. RUD330 included 8 copies of all the rRNA genes (23S; 16S; 5S), the length of the plasmid was 8.3 Kb. We hope that our genome assembly and the methods used can help other investigators in the assembly of complex genomes. Our reliable assembly could be a good basis for further physiological and genetic engineering studies of similar strains.

Keywords: Genome assembly; Illumina; NGS sequencing; Nanopore; Paenibacillus.