Rescuing biologically relevant consensus regions across replicated samples

BMC Bioinformatics. 2023 Jun 7;24(1):240. doi: 10.1186/s12859-023-05340-x.

Abstract

Background: Protein-DNA binding sites of ChIP-seq experiments are identified where the binding affinity is significant based on a given threshold. The choice of the threshold is a trade-off between conservative region identification and discarding weak, but true binding sites.

Results: We rescue weak binding sites using MSPC, which efficiently exploits replicates to lower the threshold required to identify a site while keeping a low false-positive rate, and we compare it to IDR, a widely used post-processing method for identifying highly reproducible peaks across replicates. We observe several master transcription regulators (e.g., SP1 and GATA3) and HDAC2-GATA1 regulatory networks on rescued regions in K562 cell line.

Conclusions: We argue the biological relevance of weak binding sites and the information they add when rescued by MSPC. An implementation of the proposed extended MSPC methodology and the scripts to reproduce the performed analysis are freely available at https://genometric.github.io/MSPC/ ; MSPC is distributed as a command-line application and an R package available from Bioconductor ( https://doi.org/doi:10.18129/B9.bioc.rmspc ).

Keywords: Biological replicates; Consensus regions; Peak calling; Replicated samples; Technical replicates; Weak binding affinities.

MeSH terms

  • Binding Sites
  • Chromatin Immunoprecipitation Sequencing*
  • Consensus
  • Sequence Analysis, DNA / methods
  • Software*