Using Global t-SNE to Preserve Intercluster Data Structure

Yuansheng Zhou; Tatyana O Sharpee

doi:10.1162/neco_a_01504

Using Global t-SNE to Preserve Intercluster Data Structure

Neural Comput. 2022 Jul 14;34(8):1637-1651. doi: 10.1162/neco_a_01504.

Authors

Yuansheng Zhou^{1

2}, Tatyana O Sharpee^{1

3}

Affiliations

¹ Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, U.S.A.
² Division of Biological Sciences, University of California San Diego, La Jolla, CA 92037, U.S.A. yuz461@ucsd.edu.
³ Department of Physics, University of California San Diego, La Jolla, CA 92037, U.S.A. sharpee@salk.edu.

Abstract

The t-distributed stochastic neighbor embedding (t-SNE) method is one of the leading techniques for data visualization and clustering. This method finds lower-dimensional embedding of data points while minimizing distortions in distances between neighboring data points. By construction, t-SNE discards information about large-scale structure of the data. We show that adding a global cost function to the t-SNE cost function makes it possible to cluster the data while preserving global intercluster data structure. We test the new global t-SNE (g-SNE) method on one synthetic and two real data sets on flower shapes and human brain cells. We find that significant and meaningful global structure exists in both the plant and human brain data sets. In all cases, g-SNE outperforms t-SNE and UMAP in preserving the global structure. Topological analysis of the clustering result makes it possible to find an appropriate trade-off of data distribution across scales. We find differences in how data are distributed across scales between the two subjects that were part of the human brain data set. Thus, by striving to produce both accurate clustering and positioning between clusters, the g-SNE method can identify new aspects of data organization across scales.

Using Global t-SNE to Preserve Intercluster Data Structure

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding