Pygenomics: manipulating genomic intervals and data files in Python

Bioinformatics. 2023 Jun 1;39(6):btad346. doi: 10.1093/bioinformatics/btad346.

Abstract

Summary: We present pygenomics, a Python package for working with genomic intervals and bioinformatic data files. The package implements interval operations, provides both API and CLI, and supports reading and writing data in widely used bioinformatic formats, including BAM, BED, GFF3, and VCF. The source code of pygenomics is provided with in-source documentation and type annotations and adheres to the functional programming paradigm. These features facilitate seamless integration of pygenomics routines into scripts and pipelines. The package is implemented in pure Python using its standard library only and contains the property-based testing framework. Comparison of pygenomics with other Python bioinformatic packages with relation to features and performance is presented. The performance comparison covers operations with genomic intervals, read alignments, and genomic variants and demonstrates that pygenomics is suitable for computationally effective analysis.

Availability and implementation: The source code is available at https://gitlab.com/gtamazian/pygenomics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology*
  • Gene Library
  • Genome
  • Genomics*
  • Software