In silico hybridization enables transcriptomic illumination of the nature and evolution of Myxozoa

BMC Genomics. 2015 Oct 23:16:840. doi: 10.1186/s12864-015-2039-6.

Abstract

Background: The Myxozoa, a group of oligocellular, obligate endoparasites, has long been poorly understood in an evolutionary context. Recent genome-level sequencing techniques such as RNA-seq have generated large amounts of myxozoan sequence data, providing valuable insight into their evolutionary history. However, sequences from host tissue contamination are present in next-generation sequencing reactions of myxozoan tissue, and differentiating between the two has been inadequately addressed. In order to shed light on the genetic underpinnings of myxozoan biology, assembled contigs generated from these studies that derived from the myxozoan must be decoupled from transcripts derived from host tissue and other contamination. This study describes a pipeline for categorization of transcripts asmyxozoan based on similarity searching with known host and parasite sequences, explores the extent to which host contamination is present in previously existing myxozoan datasets, and implements this pipeline on a newly sequenced transcriptome of Myxobolus pendula, a parasite of the common creek chub gill arch.

Methods: The insilico hybridization pipeline uses iterative BLAST searching and database-driven e-value comparison to categorize transcripts as deriving from host, parasite, or other contamination. Functional genetic analysis of M. pendula was conducted using further BLAST searching, Hidden Markov Modeling, and sequence alignment and phylogenetic reconstruction.

Results: Three RNA libraries of encysted M. pendula plasmodia were sequenced and subjected to the method. Nearly half of the final set of contiguous assembly sequences (47.3 %) was identified as putative myxozoan transcripts. Putative contamination was also identified in at least 1/3(rd) of previously published myxozoan transcripts. The set of M. pendula transcripts was mined for a range of biologically insightful genes, including taxonomically restricted nematocyst structural proteins and nematocyst proteins identified through mass tandem spectrometry of other cnidarians. Several novel findings emerged, including a fourth myxozoan minicollagen gene, putative myxozoan toxin proteins,and extracellular matrix glycoproteins.

Conclusions: This study serves as a model for the handling of next-generation myxozoan sequence. The need for careful categorization was demonstrated in both previous and new sets of myxozoan sequences. The final set of confidently assigned myxozoan transcripts can be mined for any biologically relevant gene or gene family without spurious misidentification of host contamination as a myxozoan homolog. As exemplified by M. pendula, the repertoire of myxozoan polar capsules may be more complex than previously thought, with an additional minicollagen homolog and putative expression of toxin proteins.

MeSH terms

  • Animals
  • Computer Simulation
  • Genome
  • High-Throughput Nucleotide Sequencing / methods*
  • Myxozoa / genetics*
  • Phylogeny*
  • Toxins, Biological / genetics
  • Transcriptome / genetics*

Substances

  • Toxins, Biological