Transcriptome analysis and de novo annotation of the critically endangered Amur sturgeon (Acipenser schrenckii)

Genet Mol Res. 2016 Jun 20;15(2). doi: 10.4238/gmr.15027999.

Abstract

The aim of this study was to provide comprehensive insights into the genetic background of sturgeon by transcriptome study. We performed a de novo assembly of the Amur sturgeon Acipenser schrenckii transcriptome using Illumina Hiseq 2000 sequencing. A total of 148,817 non-redundant unigenes with base length of approximately 121,698,536 bp and ranges from 201 to 26,789 bp were obtained. All the unigenes were classified into 3368 distinct categories and 145,449 singletons by homologous transcript cluster analysis. In all, 46,865 (31.49%) unigenes showed homologous matches with Nr database and 32,214 (21.65%) unigenes were matched to Nt database. In total, 24,862 unigenes were categorized into significantly enriched 52 function groups by GO analysis, and 38,436 unigenes were classified into 25 groups by KOG prediction, as well as 128 enriched KEGG pathways were identified by 45,598 unigenes (P < 0.05). Subsequently, a total of 19,860 SSRs markers were identified with the abundant di-nucleotide type (10,658; 53.67%) and the most AT/TA motif repeats (2689; 13.54%). A total of 1341 conserved lncRNAs were identified by a customized pipeline. Our study provides new sequence and function information for A. schrenckii, which will be the basis for further genetic studies on sturgeon species. The huge number of potential SSRs and putatively conserved lncRNAs isolated by the transcriptome also shed light on research in many fields, including the evolution, conservation management, and biological processes in sturgeon.

MeSH terms

  • Animals
  • Conserved Sequence
  • Endangered Species*
  • Evolution, Molecular
  • Fish Proteins / genetics*
  • Fish Proteins / metabolism
  • Fishes / genetics*
  • Microsatellite Repeats
  • Molecular Sequence Annotation
  • RNA, Long Noncoding / genetics
  • Transcriptome*

Substances

  • Fish Proteins
  • RNA, Long Noncoding