Unmapped reads from cattle RNAseq data: A source for missing and misassembled sequences in the reference assemblies and for detection of pathogens in the host

Genomics. 2017 Jan;109(1):36-42. doi: 10.1016/j.ygeno.2016.11.009. Epub 2016 Nov 29.

Abstract

Usually, reads from transcriptome sequencing data unmapped to the target species' reference genome are disregarded. A recent RNAseq project on the new fatal disease Bovine Neonatal Pancytopenia had indicated an unexplained immune response signature to a double-stranded RNA virus. To unravel its background, contigs were de novo assembled from unmapped RNAseq reads and aligned against the bovine genome assemblies and multispecies NCBI databases. Lack of genuine virus sequence contigs rejected the hypothesis of a live virus being causal for the unexplained immune response. Alignment data also demonstrated incomplete bovine reference genome assemblies. In addition, we found that several parasite and virus genome reference assemblies in NCBI were contaminated with bovine DNA and confirmed recombination of bovine DNA into BVD virus strains. Exploring unmapped reads can extract useful biological information regarding the presence of microorganisms and can highlight issues with reference genome assemblies of host and pathogen species.

Keywords: Bioinformatics; Next generation sequencing; RNAseq; Unmapped reads.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Cattle / genetics*
  • Cattle / microbiology
  • Cattle / parasitology
  • Cattle / virology
  • Computational Biology
  • Female
  • Genome*
  • High-Throughput Nucleotide Sequencing / standards*
  • Sequence Analysis, RNA / standards*