Haplotype-Phased Genome Assembly of Virulent Phytophthora ramorum Isolate ND886 Facilitated by Long-Read Sequencing Reveals Effector Polymorphisms and Copy Number Variation

Mol Plant Microbe Interact. 2019 Aug;32(8):1047-1060. doi: 10.1094/MPMI-08-18-0222-R. Epub 2019 Jun 10.

Abstract

Phytophthora ramorum is a destructive pathogen that causes sudden oak death disease. The genome sequence of P. ramorum isolate Pr102 was previously produced, using Sanger reads, and contained 12 Mb of gaps. However, isolate Pr102 had shown reduced aggressiveness and genome abnormalities. In order to produce an improved genome assembly for P. ramorum, we performed long-read sequencing of highly aggressive P. ramorum isolate CDFA1418886 (abbreviated as ND886). We generated a 60.5-Mb assembly of the ND886 genome using the Pacific Biosciences (PacBio) sequencing platform. The assembly includes 302 primary contigs (60.2 Mb) and nine unplaced contigs (265 kb). Additionally, we found a 'highly repetitive' component from the PacBio unassembled unmapped reads containing tandem repeats that are not part of the 60.5-Mb genome. The overall repeat content in the primary assembly was much higher than the Pr102 Sanger version (48 versus 29%), indicating that the long reads have captured repetitive regions effectively. The 302 primary contigs were phased into 345 haplotype blocks and 222,892 phased variants, of which the longest phased block was 1,513,201 bp with 7,265 phased variants. The improved phased assembly facilitated identification of 21 and 25 Crinkler effectors and 393 and 394 RXLR effector genes from two haplotypes. Of these, 24 and 25 RXLR effectors were newly predicted from haplotypes A and B, respectively. In addition, seven new paralogs of effector Avh207 were found in contig 54, not reported earlier. Comparison of the ND886 assembly with Pr102 V1 assembly suggests that several repeat-rich smaller scaffolds within the Pr102 V1 assembly were possibly misassembled; these regions are fully encompassed now in ND886 contigs. Our analysis further reveals that Pr102 is a heterokaryon with multiple nuclear types in the sequences corresponding to contig 10 of ND886 assembly.

Keywords: diseases; genomics.

MeSH terms

  • DNA Copy Number Variations*
  • Genome, Protozoan* / genetics
  • Haplotypes
  • Phytophthora* / genetics
  • Polymorphism, Genetic*