A bedr way of genomic interval processing

Source Code Biol Med. 2016 Dec 15:11:14. doi: 10.1186/s13029-016-0059-5. eCollection 2016.

Abstract

Background: Next-generation sequencing is making it critical to robustly and rapidly handle genomic ranges within standard pipelines. Standard use-cases include annotating sequence ranges with gene or other genomic annotation, merging multiple experiments together and subsequently quantifying and visualizing the overlap. The most widely-used tools for these tasks work at the command-line (e.g. BEDTools) and the small number of available R packages are either slow or have distinct semantics and features from command-line interfaces.

Results: To provide a robust R-based interface to standard command-line tools for genomic coordinate manipulation, we created bedr. This open-source R package can use either BEDTools or BEDOPS as a back-end and performs data-manipulation extremely quickly, creating R data structures that can be readily interfaced with existing computational pipelines. It includes data-visualization capabilities and a number of data-access functions that interface with standard databases like UCSC and COSMIC.

Conclusions: bedr package provides an open source solution to enable genomic interval data manipulation and restructuring in R programming language which is commonly used in bioinformatics, and therefore would be useful to bioinformaticians and genomic researchers.

Keywords: BED format; Data integration; Genomic intervals; Sequence algebra.