Multi-omics integration in the age of million single-cell data

Zhen Miao; Benjamin D Humphreys; Andrew P McMahon; Junhyong Kim

doi:10.1038/s41581-021-00463-x

Multi-omics integration in the age of million single-cell data

Nat Rev Nephrol. 2021 Nov;17(11):710-724. doi: 10.1038/s41581-021-00463-x. Epub 2021 Aug 20.

Authors

Zhen Miao^{1

2}, Benjamin D Humphreys³, Andrew P McMahon⁴, Junhyong Kim^{5

6}

Affiliations

¹ Department of Biology, University of Pennsylvania, Philadelphia, PA, USA.
² Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
³ Division of Nephrology, Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA.
⁴ Department of Stem Cell Biology and Regenerative Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
⁵ Department of Biology, University of Pennsylvania, Philadelphia, PA, USA. junhyong@sas.upenn.edu.
⁶ Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. junhyong@sas.upenn.edu.

Abstract

An explosion in single-cell technologies has revealed a previously underappreciated heterogeneity of cell types and novel cell-state associations with sex, disease, development and other processes. Starting with transcriptome analyses, single-cell techniques have extended to multi-omics approaches and now enable the simultaneous measurement of data modalities and spatial cellular context. Data are now available for millions of cells, for whole-genome measurements and for multiple modalities. Although analyses of such multimodal datasets have the potential to provide new insights into biological processes that cannot be inferred with a single mode of assay, the integration of very large, complex, multimodal data into biological models and mechanisms represents a considerable challenge. An understanding of the principles of data integration and visualization methods is required to determine what methods are best applied to a particular single-cell dataset. Each class of method has advantages and pitfalls in terms of its ability to achieve various biological goals, including cell-type classification, regulatory network modelling and biological process inference. In choosing a data integration strategy, consideration must be given to whether the multi-omics data are matched (that is, measured on the same cell) or unmatched (that is, measured on different cells) and, more importantly, the overall modelling and visualization goals of the integrated analysis.

Publication types

Review

MeSH terms

Computational Biology
Data Analysis
Data Visualization
Epigenomics
Gene Expression Profiling
Genomics*
Humans
Proteomics
Single-Cell Analysis*

Grants and funding

UC2 DK126024/DK/NIDDK NIH HHS/United States