Going beyond consensus genome sequences: An innovative SNP-based methodology reconstructs different Ugandan cassava brown streak virus haplotypes at a nationwide scale in Rwanda

Virus Evol. 2023 Aug 24;9(2):vead053. doi: 10.1093/ve/vead053. eCollection 2023.

Abstract

Cassava Brown Streak Disease (CBSD), which is caused by cassava brown streak virus (CBSV) and Ugandan cassava brown streak virus (UCBSV), represents one of the most devastating threats to cassava production in Africa, including in Rwanda where a dramatic epidemic in 2014 dropped cassava yield from 3.3 million to 900,000 tonnes (1). Studying viral genetic diversity at the genome level is essential in disease management, as it can provide valuable information on the origin and dynamics of epidemic events. To fill the current lack of genome-based diversity studies of UCBSV, we performed a nationwide survey of cassava ipomovirus genomic sequences in Rwanda by high-throughput sequencing (HTS) of pools of plants sampled from 130 cassava fields in thirteen cassava-producing districts, spanning seven agro-ecological zones with contrasting climatic conditions and different cassava cultivars. HTS allowed the assembly of a nearly complete consensus genome of UCBSV in twelve districts. The phylogenetic analysis revealed high homology between UCBSV genome sequences, with a maximum of 0.8 per cent divergence between genomes at the nucleotide level. An in-depth investigation based on Single Nucleotide Polymorphisms (SNPs) was conducted to explore the genome diversity beyond the consensus sequences. First, to ensure the validity of the result, a panel of SNPs was confirmed by independent reverse transcription polymerase chain reaction (RT-PCR) and Sanger sequencing. Furthermore, the combination of fixation index (FST) calculation and Principal Component Analysis (PCA) based on SNP patterns identified three different UCBSV haplotypes geographically clustered. The haplotype 2 (H2) was restricted to the central regions, where the NAROCAS 1 cultivar is predominantly farmed. RT-PCR and Sanger sequencing of individual NAROCAS1 plants confirmed their association with H2. Haplotype 1 was widely spread, with a 100 per cent occurrence in the Eastern region, while Haplotype 3 was only found in the Western region. These haplotypes' associations with specific cultivars or regions would need further confirmation. Our results prove that a much more complex picture of genetic diversity can be deciphered beyond the consensus sequences, with practical implications on virus epidemiology, evolution, and disease management. Our methodology proposes a high-resolution analysis of genome diversity beyond the consensus between and within samples. It can be used at various scales, from individual plants to pooled samples of virus-infected plants. Our findings also showed how subtle genetic differences could be informative on the potential impact of agricultural practices, as the presence and frequency of a virus haplotype could be correlated with the dissemination and adoption of improved cultivars.

Keywords: Rwanda; SNP; UCBSV; ampelovirus; cassava; high throughput sequencing.