Background: Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but for large-scale scRNA-seq datasets, computation time is long and consumes large amounts of memory.
Results: In this work, we review the existing fast and memory-efficient PCA algorithms and implementations and evaluate their practical application to large-scale scRNA-seq datasets. Our benchmark shows that some PCA algorithms based on Krylov subspace and randomized singular value decomposition are fast, memory-efficient, and more accurate than the other algorithms.
Conclusion: We develop a guideline to select an appropriate PCA implementation based on the differences in the computational environment of users and developers.
Keywords: Cellular heterogeneity; Dimension reduction; Julia; Online/incremental algorithm; Out-of-core; Principal component analysis; Python; R; Randomized algorithm; Single-cell RNA-seq; Sparse data format.