[Automatic clustering method of flow cytometry data based on t-distributed stochastic neighbor embedding]

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2018 Oct 25;35(5):697-704. doi: 10.7507/1001-5515.201802037.
[Article in Chinese]

Abstract

The traditional method of multi-parameter flow data clustering in flow cytometry is to mainly use professional software to manually set the door and circle out the target cells for analysis. The analysis process is complex and professional. Based on this, a clustering algorithm, which is based on t-distributed stochastic neighbor embedding ( t-SNE) algorithm for multi-parameter stream data, is proposed in the paper. In this algorithm, the Euclidean distance of sample data in high dimensional space is transformed into conditional probability to represent similarity, and the data is reduced to low dimensional space. In this paper, the stained human peripheral blood cells were treated by flow cytometry, and the processed data were derived as experimental sample data. The t-SNE algorithm is compared with the kernel principal component analysis (KPCA) dimensionality reduction algorithm, and the main component data obtained by the dimensionality reduction are classified using K-means algorithm. The results show that the t-SNE algorithm has a good clustering effect on the cell population with asymmetric and trailing distribution, and the clustering accuracy can reach 92.55%, which may be helpful for automatic analysis of multi-color multi-parameter flow data.

流式细胞仪中多参数流式数据分群传统方法主要是利用专业软件采取人工设门方式,圈出目标细胞进行分析,分析过程较为复杂,专业性较强。基于此,本文提出了一种基于 t 分布邻域嵌入( t-SNE)算法对多参数流式数据进行分群处理。该算法将样本数据在高维空间中的欧几里德距离转化为条件概率来表征相似性,使数据降到低维空间。本文通过使用流式细胞仪处理染色后的人体外周血细胞,并将处理后的数据导出作为实验样本数据,对其利用 t-SNE 算法进行降维,并与核主成分分析(KPCA)降维算法对比,分别使用 K 均值( K-means)算法对降维得到的主成分数据进行分类。结果表明, t-SNE 算法对呈非对称且有拖尾分布的细胞类群具有很好的分群效果,分群准确率可达 92.55%,或可有助于多色多参数流式数据进行自动分析。.

Keywords: K-means; biomedicine; cell clustering; kernel principal component analysis; t-distributed stochastic neighbor embedding.

Publication types

  • English Abstract

Grants and funding

国家自然科学基金(61605010);教育部“长江学者和创新团队”发展计划(IRT_16R07)