CSEQ-SIMULATOR: A DATA SIMULATOR FOR CLIP-SEQ EXPERIMENTS

Pac Symp Biocomput. 2016:21:433-44.

Abstract

CLIP-Seq protocols such as PAR-CLIP, HITS-CLIP or iCLIP allow a genome-wide analysis of protein-RNA interactions. For the processing of the resulting short read data, various tools are utilized. Some of these tools were specifically developed for CLIP-Seq data, whereas others were designed for the analysis of RNA-Seq data. To this date, however, it has not been assessed which of the available tools are most appropriate for the analysis of CLIP-Seq data. This is because an experimental gold standard dataset on which methods can be accessed and compared, is still not available. To address this lack of a gold-standard dataset, we here present Cseq-Simulator, a simulator for PAR-CLIP, HITS-CLIP and iCLIP-data. This simulator can be applied to generate realistic datasets that can serve as surrogates for experimental gold standard dataset. In this work, we also show how Cseq-Simulator can be used to perform a comparison of steps of typical CLIP-Seq analysis pipelines, such as the read alignment or the peak calling. These comparisons show which tools are useful in different settings and also allow identifying pitfalls in the data analysis.

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Computational Biology / statistics & numerical data
  • Computer Simulation
  • Cross-Linking Reagents
  • Genome, Human
  • High-Throughput Nucleotide Sequencing / statistics & numerical data*
  • Humans
  • RNA / genetics
  • RNA / metabolism
  • RNA Processing, Post-Transcriptional
  • RNA-Binding Proteins / metabolism
  • Sequence Alignment / statistics & numerical data
  • Sequence Analysis, RNA / statistics & numerical data*
  • Software*

Substances

  • Cross-Linking Reagents
  • RNA-Binding Proteins
  • RNA