LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis

Bioinformatics. 2023 Jul 1;39(7):btad420. doi: 10.1093/bioinformatics/btad420.

Abstract

Motivation: Modality matching in single-cell omics data analysis-i.e. matching cells across datasets collected using different types of genomic assays-has become an important problem, because unifying perspectives across different technologies holds the promise of yielding biological and clinical discoveries. However, single-cell dataset sizes can now reach hundreds of thousands to millions of cells, which remain out of reach for most multimodal computational methods.

Results: We propose LSMMD-MA, a large-scale Python implementation of the MMD-MA method for multimodal data integration. In LSMMD-MA, we reformulate the MMD-MA optimization problem using linear algebra and solve it with KeOps, a CUDA framework for symbolic matrix computation in Python. We show that LSMMD-MA scales to a million cells in each modality, two orders of magnitude greater than existing implementations.

Availability and implementation: LSMMD-MA is freely available at https://github.com/google-research/large_scale_mmdma and archived at https://doi.org/10.5281/zenodo.8076311.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Data Analysis
  • Genome*
  • Genomics* / methods
  • Research Design
  • Single-Cell Analysis
  • Software