Usually, reads from transcriptome sequencing data unmapped to the target species' reference genome are disregarded. A recent RNAseq project on the new fatal disease Bovine Neonatal Pancytopenia had indicated an unexplained immune response signature to a double-stranded RNA virus. To unravel its background, contigs were de novo assembled from unmapped RNAseq reads and aligned against the bovine genome assemblies and multispecies NCBI databases. Lack of genuine virus sequence contigs rejected the hypothesis of a live virus being causal for the unexplained immune response. Alignment data also demonstrated incomplete bovine reference genome assemblies. In addition, we found that several parasite and virus genome reference assemblies in NCBI were contaminated with bovine DNA and confirmed recombination of bovine DNA into BVD virus strains. Exploring unmapped reads can extract useful biological information regarding the presence of microorganisms and can highlight issues with reference genome assemblies of host and pathogen species.
Keywords: Bioinformatics; Next generation sequencing; RNAseq; Unmapped reads.
Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.