Echidna: integrated simulations of single-cell immune receptor repertoires and transcriptomes

Bioinform Adv. 2022 Sep 2;2(1):vbac062. doi: 10.1093/bioadv/vbac062. eCollection 2022.

Abstract

Motivation: Single-cell sequencing now enables the recovery of full-length immune receptor repertoires [B cell receptor (BCR) and T cell receptor (TCR) repertoires], in addition to gene expression information. The feature-rich datasets produced from such experiments require extensive and diverse computational analyses, each of which can significantly influence the downstream immunological interpretations, such as clonal selection and expansion. Simulations produce validated standard datasets, where the underlying generative model can be precisely defined and furthermore perturbed to investigate specific questions of interest. Currently, there is no tool that can be used to simulate single-cell datasets incorporating immune receptor repertoires and gene expression.

Results: We developed Echidna, an R package that simulates immune receptors and transcriptomes at single-cell resolution with user-tunable parameters controlling a wide range of features such as clonal expansion, germline gene usage, somatic hypermutation, transcriptional phenotypes and spatial location. Echidna can additionally simulate time-resolved B cell evolution, producing mutational networks with complex selection histories incorporating class-switching and B cell subtype information. We demonstrated the benchmarking potential of Echidna by simulating clonal lineages and comparing the known simulated networks with those inferred from only the BCR sequences as input. Finally, we simulated immune repertoire information onto existing spatial transcriptomic experiments, thereby generating novel datasets that could be used to develop and integrate methods to profile clonal selection in a spatially resolved manner. Together, Echidna provides a framework that can incorporate experimental data to simulate single-cell immune repertoires to aid software development and bioinformatic benchmarking of clonotyping, phylogenetics, transcriptomics and machine learning strategies.

Availability and implementation: The R package and code used in this manuscript can be found at github.com/alexyermanos/echidna and also in the R package Platypus (Yermanos et al., 2021). Installation instructions and the vignette for Echidna is described in the Platypus Computational Ecosystem (https://alexyermanos.github.io/Platypus/index.html). Publicly available data and corresponding sample accession numbers can be found in Supplementary Tables S2 and S3.

Supplementary information: Supplementary data are available at Bioinformatics Advances online.