SNP discovery in European anchovy (Engraulis encrasicolus, L) by high-throughput transcriptome and genome sequencing

PLoS One. 2013 Aug 1;8(8):e70051. doi: 10.1371/journal.pone.0070051. Print 2013.

Abstract

Increased throughput in sequencing technologies has facilitated the acquisition of detailed genomic information in non-model species. The focus of this research was to discover and validate SNPs derived from the European anchovy (Engraulis encrasicolus) transcriptome, a species with no available reference genome, using next generation sequencing technologies. A cDNA library was constructed from four tissues of ten fish individuals corresponding to three populations of E. encrasicolus, and Roche 454 GS FLX Titanium sequencing yielded 19,367 contigs. Additionally, the European anchovy genome was sequenced for the same ten individuals using an Illumina HiSeq2000. Using a computational pipeline for combining transcriptome and genome information, a total of 18,994 SNPs met the necessary minor allele frequency and depth filters. A series of further stringent filters were applied to identify those SNPs likely to succeed in genotyping assays, and for filtering of those in potential duplicated genome regions. A novel method for detecting potential intron-exon boundaries in areas of putative SNPs has also been applied in silico to improve genotyping success. In all, 2,317 filtered putative transcriptome SNPs suitable for genotyping primer design were identified. From those, a subset of 530 were selected, with the genotyping results showing the highest reported conversion and validation rates (91.3% and 83.2%, respectively) reported to date for a non-model species. This study represents a promising strategy to discover genotypable SNPs in the exome of non-model organisms. The genomic resource generated for E. encrasicolus, both in terms of sequences and novel markers, will be informative for research into this species with applications including traceability studies, population genetic analyses and aquaculture.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Chromosome Mapping
  • Exons / genetics
  • Fishes / genetics*
  • Gene Expression Profiling*
  • Genetic Loci / genetics
  • Genetic Markers / genetics
  • Genomics*
  • Genotyping Techniques
  • High-Throughput Nucleotide Sequencing*
  • Introns / genetics
  • Microsatellite Repeats / genetics
  • Molecular Sequence Annotation
  • Polymorphism, Single Nucleotide*
  • Sequence Analysis, DNA*
  • Sequence Analysis, RNA*

Substances

  • Genetic Markers

Grants and funding

This research was supported by the project ECOGENBAY (MICINN CTM2009-13570-C02-02) funded by the Ministry of Science and Research of the Government of Spain, and by a Research Grant (3571/2008) from the University of the Basque Country UPV/EHU. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.