Dimensionality reduction for visualizing high-dimensional biological data

Biosystems. 2022 Oct:220:104749. doi: 10.1016/j.biosystems.2022.104749. Epub 2022 Jul 30.

Abstract

High throughput technologies used in experimental biological sciences produce data with a vast number of variables at a rapid pace, making large volumes of high-dimensional data available. The exploratory analysis of such high-dimensional data can be aided by human interpretable low-dimensional visualizations. This work investigates how both discrete and continuous structures in biological data can be captured using the recently proposed dimensionality reduction method SONG, and compares the results with commonly used methods UMAP and PHATE. Using simulated and real-world datasets, we observe that SONG produces insightful visualizations by preserving various patterns, including discrete clusters, continuums, and branching structures in all considered datasets. More importantly, for datasets containing both discrete and continuous structures, SONG performs better at preserving both the structures compared to UMAP and PHATE. Furthermore, our quantitative evaluation of the three methods using downstream analysis validates the on par quality of the SONG's low-dimensional embeddings compared to the other methods.

Keywords: Dimensionality-reduction; High-dimensional; Microbial data; Single-cell.