Single-molecule Real-time (SMRT) Sequencing Facilitates Transcriptome Research and Genome Annotation of the Fish Sillago sinica

Mar Biotechnol (NY). 2022 Oct;24(5):1002-1013. doi: 10.1007/s10126-022-10163-7. Epub 2022 Sep 9.

Abstract

As a newly described Sillaginidae species, Chinese sillago (Sillago sinica) needs a better understanding of gene annotation information. In this study, we reported the first full-length transcriptome data of S. sinica using the PacBio isoform sequencing Iso-seq and a description of transcriptome structure analysis. A total of 454,979 high-quality full-length transcripts were obtained by single-molecule real-time (SMRT) sequencing, which was corrected by Illumina sequencing data. After that, 66,948 non-redundant full-length transcripts were generated after mapping to the reference genome of S. sinica, including 49 fusion isoforms and 9,250 novel isoforms. 63,459 isoforms were successfully annotated by one of the Nr, Nt, SwissProt, Pfam, KOG, GO, and KEGG databases. Additionally, 30,987 alternative polyadenylation (APA) sites, 451,867 alternative splicing (AS) events, 21,928 long non-coding RNAs (lncRNAs) and 12,911 transcription factors (TFs) were identified. The full-length transcripts of S. sinica would provide a precious resource for characterizing the transcriptome of S. sinica and for the further study of gene function and regulatory mechanism of this species.

Keywords: Full-length transcriptome; Genome annotation; SMRT sequencing; Sand whiting; Sillago sinica.

MeSH terms

  • Alternative Splicing
  • Animals
  • Fishes / genetics
  • Gene Expression Profiling
  • Genome
  • High-Throughput Nucleotide Sequencing
  • Molecular Sequence Annotation
  • Protein Isoforms / genetics
  • RNA, Long Noncoding* / genetics
  • Transcription Factors / genetics
  • Transcriptome*

Substances

  • Protein Isoforms
  • RNA, Long Noncoding
  • Transcription Factors