A Global Approach to Estimating the Abundance and Duplication of Polyketide Synthase Domains in Dinoflagellates

Evol Bioinform Online. 2021 Jul 14:17:11769343211031871. doi: 10.1177/11769343211031871. eCollection 2021.

Abstract

Many dinoflagellate species make toxins in a myriad of different molecular configurations but the underlying chemistry in all cases is presumably via modular synthases, primarily polyketide synthases. In many organisms modular synthases occur as discrete synthetic genes or domains within a gene that act in coordination thus forming a module that produces a particular fragment of a natural product. The modules usually occur in tandem as gene clusters with a syntenic arrangement that is often predictive of the resultant structure. Dinoflagellate genomes however are notoriously complex with individual genes present in many tandem repeats and very few synthetic modules occurring as gene clusters, unlike what has been seen in bacteria and fungi. However, modular synthesis in all organisms requires a free thiol group that acts as a carrier for sequential synthesis called a thiolation domain. We scanned 47 dinoflagellate transcriptomes for 23 modular synthase domain models and compared their abundance among 10 orders of dinoflagellates as well as their co-occurrence with thiolation domains. The total count of domain types was quite large with over thirty-thousand identified, 29 000 of which were in the core dinoflagellates. Although there were no specific trends in domain abundance associated with types of toxins, there were readily observable lineage specific differences. The Gymnodiniales, makers of long polyketide toxins such as brevetoxin and karlotoxin had a high relative abundance of thiolation domains as well as multiple thiolation domains within a single transcript. Orders such as the Gonyaulacales, makers of small polyketides such as spirolides, had fewer thiolation domains but a relative increase in the number of acyl transferases. Unique to the core dinoflagellates, however, were thiolation domains occurring alongside tetratricopeptide repeats that facilitate protein-protein interactions, especially hexa and hepta-repeats, that may explain the scaffolding required for synthetic complexes capable of making large toxins. Clustering analysis for each type of domain was also used to discern possible origins of duplication for the multitude of single domain transcripts. Single domain transcripts frequently clustered with synonymous domains from multi-domain transcripts such as the BurA and ZmaK like genes as well as the multi-ketosynthase genes, sometimes with a large degree of apparent gene duplication, while fatty acid synthesis genes formed distinct clusters. Surprisingly the acyl-transferases and ketoreductases involved in fatty acid synthesis (FabD and FabG, respectively) were found in very large clusters indicating an unprecedented degree of gene duplication for these genes. These results demonstrate a complex evolutionary history of core dinoflagellate modular synthases with domain specific duplications throughout the lineage as well as clues to how large protein complexes can be assembled to synthesize the largest natural products known.

Keywords: Clustersing; Dinoflagellate; Gene Duplication; Hidden Markov Model; Polyketide; Toxin.