PyIR: a scalable wrapper for processing billions of immunoglobulin and T cell receptor sequences using IgBLAST

BMC Bioinformatics. 2020 Jul 16;21(1):314. doi: 10.1186/s12859-020-03649-5.

Abstract

Background: Recent advances in DNA sequencing technologies have enabled significant leaps in capacity to generate large volumes of DNA sequence data, which has spurred a rapid growth in the use of bioinformatics as a means of interrogating antibody variable gene repertoires. Common tools used for annotation of antibody sequences are often limited in functionality, modularity and usability.

Results: We have developed PyIR, a Python wrapper and library for IgBLAST, which offers a minimal setup CLI and API, FASTQ support, file chunking for large sequence files, JSON and Python dictionary output, and built-in sequence filtering.

Conclusions: PyIR offers improved processing speed over multithreaded IgBLAST (version 1.14) when spawning more than 16 processes on a single computer system. Its customizable filtering and data encapsulation allow it to be adapted to a wide range of computing environments. The API allows for IgBLAST to be used in customized bioinformatics workflows.

Keywords: Antibody; CDR3; IgBLAST; Illumina; Immune repertoires.

MeSH terms

  • Base Sequence
  • Humans
  • Immunoglobulins / genetics*
  • Receptors, Antigen, T-Cell / genetics*
  • Sequence Alignment*
  • Sequence Analysis, DNA
  • Software*
  • Time Factors
  • User-Computer Interface

Substances

  • Immunoglobulins
  • Receptors, Antigen, T-Cell