An update on sORFs.org: a repository of small ORFs identified by ribosome profiling

Nucleic Acids Res. 2018 Jan 4;46(D1):D497-D502. doi: 10.1093/nar/gkx1130.

Abstract

sORFs.org (http://www.sorfs.org) is a public repository of small open reading frames (sORFs) identified by ribosome profiling (RIBO-seq). This update elaborates on the major improvements implemented since its initial release. sORFs.org now additionally supports three more species (zebrafish, rat and Caenorhabditis elegans) and currently includes 78 RIBO-seq datasets, a vast increase compared to the three that were processed in the initial release. Therefore, a novel pipeline was constructed that also enables sORF detection in RIBO-seq datasets comprising solely elongating RIBO-seq data while previously, matching initiating RIBO-seq data was necessary to delineate the sORFs. Furthermore, a novel noise filtering algorithm was designed, able to distinguish sORFs with true ribosomal activity from simulated noise, consequently reducing the false positive identification rate. The inclusion of other species also led to the development of an inner BLAST pipeline, assessing sequence similarity between sORFs in the repository. Building on the proof of concept model in the initial release of sORFs.org, a full PRIDE-ReSpin pipeline was now released, reprocessing publicly available MS-based proteomics PRIDE datasets, reporting on true translation events. Next to reporting those identified peptides, sORFs.org allows visual inspection of the annotated spectra within the Lorikeet MS/MS viewer, thus enabling detailed manual inspection and interpretation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Base Sequence
  • Caenorhabditis elegans / genetics
  • Caenorhabditis elegans / metabolism
  • Conserved Sequence
  • Databases, Genetic*
  • Datasets as Topic
  • Drosophila melanogaster / genetics
  • Drosophila melanogaster / metabolism
  • Humans
  • Internet
  • Mice
  • Open Reading Frames*
  • Protein Biosynthesis
  • Proteomics / methods*
  • Rats
  • Ribosomes / genetics*
  • Ribosomes / metabolism
  • Sequence Alignment
  • Signal-To-Noise Ratio
  • Software
  • Tandem Mass Spectrometry / statistics & numerical data
  • Zebrafish / genetics
  • Zebrafish / metabolism