demuxmix: Demultiplexing oligonucleotide-barcoded single-cell RNA sequencing data with regression mixture models

bioRxiv [Preprint]. 2023 Jan 29:2023.01.27.525961. doi: 10.1101/2023.01.27.525961.

Abstract

Motivation: Droplet-based single-cell RNA sequencing (scRNA-seq) is widely used in biomedical research to interrogate the transcriptomes of single cells on a large scale. Pooling and processing cells from different samples together can reduce costs and batch effects. In order to pool cells, cells are often first labeled with hashtag oligonucleotides (HTOs). These HTOs are sequenced along with the cells' RNA in the droplets and are subsequently used to computationally assign each droplet to its sample of origin, which is referred to as demultiplexing. Accurate demultiplexing is crucial and can be challenging due to background HTOs, low-quality cells/cell debris, and multiplets.

Results: A new demultiplexing method, demuxmix, based on negative binomial regression mixture models is introduced. The method implements two significant improvements. First, demuxmix's probabilistic classification framework provides error probabilities for droplet assignments that can be used to discard uncertain droplets and inform about the quality of the HTO data and the demultiplexing success. Second, demuxmix utilizes the positive association between detected genes in the RNA library and HTO counts to explain parts of the variance in the HTO data resulting in improved droplet assignments. The improved performance of demuxmix compared to existing demultiplexing methods is assessed on real and simulated data. Finally, the feasibility of accurately demultiplexing experimental designs where non-labeled cells are pooled with labeled cells is demonstrated.

Availability: R/Bioconductor package demuxmix ( https://doi.org/doi:10.18129/B9.bioc.demuxmix ).

Publication types

  • Preprint